high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » GPU-to-CPU callbacks

GPU-to-CPU callbacks

Jeff A. Stuart, Michael Cox, John D. Owens

University of California, Davis

Euro-Par 2010 Workshops: Proceedings of the Third Workshop on UnConventional High Performance Computing (UCHPC 2010), volume 6586 of Lecture Notes in Computer Science, pages 365-372, Springer, 2010

DOI:10.1007/978-3-642-21878-1_45

BibTeX

Download (PDF)

View

Source

2359

views

We present GPU-to-CPU callbacks, a new mechanism and abstraction for GPUs that offers them more independence in a heterogeneous computing environment. Specifically, we provide a method for GPUs to issue callback requests to the CPU. These requests serve as a tool for ease-of-use, future proofing of code, and new functionality. We classify the types of these requests into three categories: System calls (e.g. network and file I/O), device/host memory transfers, and CPU compute, and provide motivation as to why all are important. We show how to implement such a mechanism in CUDA using pinned system memory and discuss possible GPU-driver features to alleviate the need for polling, thus making callbacks more efficient with CPU usage and power consumption. We implement several examples demonstrating the use of callbacks for file I/O, network I/O, memory allocation, and debugging.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 280, Performance, Programming techniques

October 16, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

GPU-to-CPU callbacks

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

GPU-to-CPU callbacks

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)