high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Christoph Muller, Steffen Frey, Magnus Strengert, Carsten Dachsbacher, Thomas Ertl

Visualisierungsinstitut der Universitat Stuttgart, Stuttgart, Germany

IEEE Transactions on Visualization and Computer Graphics, 2008

DOI:10.1109/TVCG.2008.188

@article{muller2008compute,

title={A compute unified system architecture for graphics clusters incorporating data locality},

author={M{\"u}ller, C. and Frey, S. and Strengert, M. and Dachsbacher, C. and Ertl, T.},

journal={IEEE Transactions on Visualization and Computer Graphics},

pages={605–617},

year={2008},

publisher={Published by the IEEE Computer Society}

}

Download (PDF)

View

Source

2081

views

We present a development environment for distributed GPU computing targeted for multi-GPU systems, as well as graphics clusters. Our system is based on CUDA and logically extends its parallel programming model for graphics processors to higher levels of parallelism, namely, the PCI bus and network interconnects. While the extended API mimics the full function set of current graphics hardware-including the concept of global memory-on all distribution layers, the underlying communication mechanisms are handled transparently for the application developer. To allow for high scalability, in particular for network-interconnected environments, we introduce an automatic GPU-accelerated scheduling mechanism that is aware of data locality. This way, the overall amount of transmitted data can be heavily reduced, which leads to better GPU utilization and faster execution. We evaluate the performance and scalability of our system for bus and especially network-level parallelism on typical multi-GPU systems and graphics clusters.

Tags: Computer science, CUDA, GPU cluster, nVidia, nVidia GeForce 8800 GT, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, nVidia Quadro FX 5600, Performance

May 30, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Your response

Recent source codes

NVIDIA Nemotron Parse 1.1

ThunderKittens: Tile primitives for speedy kernels

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

pplx-garden: Perplexity open source garden for inference technology

LC Framework

Most viewed papers (last 30 days)

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)