high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Christoph Muller, Steffen Frey, Magnus Strengert, Carsten Dachsbacher, Thomas Ertl

Visualisierungsinstitut der Universitat Stuttgart, Stuttgart, Germany

IEEE Transactions on Visualization and Computer Graphics, 2008

DOI:10.1109/TVCG.2008.188

BibTeX

Download (PDF)

View

Source

1958

views

We present a development environment for distributed GPU computing targeted for multi-GPU systems, as well as graphics clusters. Our system is based on CUDA and logically extends its parallel programming model for graphics processors to higher levels of parallelism, namely, the PCI bus and network interconnects. While the extended API mimics the full function set of current graphics hardware-including the concept of global memory-on all distribution layers, the underlying communication mechanisms are handled transparently for the application developer. To allow for high scalability, in particular for network-interconnected environments, we introduce an automatic GPU-accelerated scheduling mechanism that is aware of data locality. This way, the overall amount of transmitted data can be heavily reduced, which leads to better GPU utilization and faster execution. We evaluate the performance and scalability of our system for bus and especially network-level parallelism on typical multi-GPU systems and graphics clusters.

Tags: Computer science, CUDA, GPU cluster, nVidia, nVidia GeForce 8800 GT, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, nVidia Quadro FX 5600, Performance

May 30, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)