high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Zero-copy I/O processing for low-latency GPU computing

Zero-copy I/O processing for low-latency GPU computing

Shinpei Kato, Jason Aumiller, Scott Brandt

Department of Information Engineering,Nagoya University;Department of Computer Science, University of California, Santa Cruz

ICCPS’13, April 8-11, 2013, Philadelphia, PA, USA

BibTeX

Download (PDF)

View

Source

2439

views

Cyber-physical systems (CPS) aim to monitor and control complex real-world phenomena where the computational cost and real-time constraints could be a major challenge. Many-core hardware accelerators such as graphics processing units (GPUs) promise to enhancing computation, leveraging the data parallelism often found in real-world scenarios of CPS, but performance is limited by the overhead of the data transfer between the host and the device memory. For example,plasma control in the HBT-EP Tokamak device at Columbia University must execute the control algorithm in a few microseconds, but may take tens of microseconds to copy the data set between the host and the device memory. This paper presents a zero-copy I/O processing scheme that maps the I/O address space of the system to the virtual address space of the compute device, allowing sensors and actuators to transfer data to and from the compute device directly. Experiments using the plasma control system show a 33% reduction in computational cost, and microbenchmarks with more generic matrix operations show a 34% reduction, while in both cases, effective data throughput remains at least as good as the current best performers.

Tags: Algorithms, Benchmarking, CUDA, Data parallelism, nVidia, Plasma physics

April 16, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Zero-copy I/O processing for low-latency GPU computing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Zero-copy I/O processing for low-latency GPU computing

Share this:

Recent source codes

Most viewed papers (last 30 days)