high performance computing on graphics processing units: hgpu.org

hgpu.org » Tesla S870

Design and Implementation of a PTX Emulation Library

Albert Claret Exojo

View

Download (PDF)

Tags: Computer science, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, Performance, PTX, Tesla S870, Thesis

November 18, 2011 by hgpu

An Execution Model and Runtime For Heterogeneous Many-Core Systems

Gregory Diamos

View

Download (PDF)

Source codes

Tags: Algorithms, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce 8800 GTS, nVidia GeForce 9800 GX2, nVidia GeForce GTX 280, Optimization, Package, PTX, Tesla C1060, Tesla S870, Thesis

October 2, 2011 by hgpu

Acceleration of large-scale FDTD simulations on high performance GPU clusters

C. Ong, M. Weldon, D. Cyca, M. Okoniewski

View

Download (PDF)

Tags: Electrodynamics, FDTD, Finite-difference time-domain, GPU cluster, nVidia, Tesla S870

May 3, 2011 by hgpu

Speculative Execution on Multi-GPU Systems

Gregory Diamos, Sudhakar Yalamanchili

View

Download (PDF)

Tags: Computer science, CUDA, GPU cluster, nVidia, nVidia GeForce 9800 GX2, Optimization, Performance, Tesla S870

March 6, 2011 by hgpu

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

Dana A. Jacobsen, Julien C. Thibault, Inanc Senocak

View

Download (PDF)

Tags: CUDA, Fluid dynamics, GPU cluster, MPI, nVidia, nVidia GeForce 9600 GT, nVidia GeForce GTX 260, Tesla C1060, Tesla C870, Tesla S1070, Tesla S870

February 24, 2011 by hgpu

Solving dense linear systems on platforms with multiple hardware accelerators

Enrique S. Quintana-Orti, Francisco D. Igual, Enrique S. Quintana-Orti, Robert A. van de Geijn

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, GPU cluster, Linear Algebra, nVidia, Package, Task scheduling, Tesla S870

January 17, 2011 by hgpu

Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms

Byunghyun Jang, David R. Kaeli, Synho Do, Homer Pien

View

Download (PDF)

Tags: Computed tomography, CT, CUDA, GPU cluster, Image processing, Image reconstruction, Medicine, nVidia, Tesla S870

December 26, 2010 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Design and Implementation of a PTX Emulation Library

An Execution Model and Runtime For Heterogeneous Many-Core Systems

Acceleration of large-scale FDTD simulations on high performance GPU clusters

Speculative Execution on Multi-GPU Systems

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

Solving dense linear systems on platforms with multiple hardware accelerators

Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)