high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Christoph Muller, Steffen Frey, Magnus Strengert, Carsten Dachsbacher, Thomas Ertl

Visualisierungsinstitut der Universitat Stuttgart, Stuttgart, Germany

IEEE Transactions on Visualization and Computer Graphics, 2008

DOI:10.1109/TVCG.2008.188

@article{muller2008compute,

title={A compute unified system architecture for graphics clusters incorporating data locality},

author={M{\"u}ller, C. and Frey, S. and Strengert, M. and Dachsbacher, C. and Ertl, T.},

journal={IEEE Transactions on Visualization and Computer Graphics},

pages={605–617},

year={2008},

publisher={Published by the IEEE Computer Society}

}

Download (PDF)

View

Source

1603

views

We present a development environment for distributed GPU computing targeted for multi-GPU systems, as well as graphics clusters. Our system is based on CUDA and logically extends its parallel programming model for graphics processors to higher levels of parallelism, namely, the PCI bus and network interconnects. While the extended API mimics the full function set of current graphics hardware-including the concept of global memory-on all distribution layers, the underlying communication mechanisms are handled transparently for the application developer. To allow for high scalability, in particular for network-interconnected environments, we introduce an automatic GPU-accelerated scheduling mechanism that is aware of data locality. This way, the overall amount of transmitted data can be heavily reduced, which leads to better GPU utilization and faster execution. We evaluate the performance and scalability of our system for bus and especially network-level parallelism on typical multi-GPU systems and graphics clusters.

Tags: Computer science, CUDA, GPU cluster, nVidia, nVidia GeForce 8800 GT, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, nVidia Quadro FX 5600, Performance

May 30, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Share this:

Recent source codes

Most viewed papers (last 30 days)