high performance computing on graphics processing units: hgpu.org

hgpu.org » CMP

Enabling OpenCL on a Configurable, VLIW Chip-Multiprocessor

Samuel J. Parker, Vassilios A. Chouliaras

View

Download (PDF)

Tags: CMP, Computer science, OpenCL

November 26, 2013 by hgpu

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control

Bo Wu, Eddy Z. Zhang, Xipeng Shen

View

Download (PDF)

Tags: Algorithms, Benchmarking, CMP, Computer science, CUDA, Heterogeneous systems, nVidia, Optimization, Tesla S1070

September 30, 2011 by hgpu

Tradeoffs in designing accelerator architectures for visual computing

Aqeel Mahesri, Daniel Johnson, Neal Crago, Sanjay J. Patel

View

Download (PDF)

Tags: Benchmarking, CMP, Computer science, Computer vision, Rendering, Video encoding, Visualization

August 4, 2011 by hgpu

A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

Shuai Che, Jeremy W. Sheaffer, Michael Boyer, Lukasz G. Szafaryn, Liang Wang, Kevin Skadron

View

Download (PDF)

Tags: Benchmarking, CMP, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480

July 22, 2011 by hgpu

Real-time Visual Tracker by Stream Processing

Oscar Mateo Lozano, Kazuhiro Otsuka

View

Download (PDF)

Tags: CMP, Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, Particle filtering, Video tracking

November 3, 2010 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Enabling OpenCL on a Configurable, VLIW Chip-Multiprocessor

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control

Tradeoffs in designing accelerator architectures for visual computing

A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

Real-time Visual Tracker by Stream Processing

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)