high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Zero-copy I/O processing for low-latency GPU computing

Zero-copy I/O processing for low-latency GPU computing

Shinpei Kato, Jason Aumiller, Scott Brandt

Department of Information Engineering,Nagoya University;Department of Computer Science, University of California, Santa Cruz

ICCPS’13, April 8-11, 2013, Philadelphia, PA, USA

@article{kato2013zero,

title={Zero-copy I/O processing for low-latency GPU computing},

author={Kato, Shinpei and Aumiller, Jason and Brandt, Scott},

year={2013}

}

Download (PDF)

View

Source

2505

views

Cyber-physical systems (CPS) aim to monitor and control complex real-world phenomena where the computational cost and real-time constraints could be a major challenge. Many-core hardware accelerators such as graphics processing units (GPUs) promise to enhancing computation, leveraging the data parallelism often found in real-world scenarios of CPS, but performance is limited by the overhead of the data transfer between the host and the device memory. For example,plasma control in the HBT-EP Tokamak device at Columbia University must execute the control algorithm in a few microseconds, but may take tens of microseconds to copy the data set between the host and the device memory. This paper presents a zero-copy I/O processing scheme that maps the I/O address space of the system to the virtual address space of the compute device, allowing sensors and actuators to transfer data to and from the compute device directly. Experiments using the plasma control system show a 33% reduction in computational cost, and microbenchmarks with more generic matrix operations show a 34% reduction, while in both cases, effective data throughput remains at least as good as the current best performers.

Tags: Algorithms, Benchmarking, CUDA, Data parallelism, nVidia, Plasma physics

April 16, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Zero-copy I/O processing for low-latency GPU computing

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Zero-copy I/O processing for low-latency GPU computing

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)