high performance computing on graphics processing units: hgpu.org

hgpu.org » ATI Radeon HD 3870

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

Byunghyun Jang, Dana Schaa, Perhaad Mistry, David Kaeli

View

Tags: ATI, ATI Radeon HD 3870, Brook, Computer science, CUDA, Data parallelism, Memory model, nVidia, nVidia GeForce GTX 285, Review

June 17, 2011 by hgpu

Architecture-Aware Optimization Targeting Multithreaded Stream Computing

Byunghyun Jang, Synho Do, Homer Pien, David Kaeli

View

Download (PDF)

Tags: ATI, ATI Radeon HD 3870, Brook, Computer science, Optimization, Programming techniques

March 4, 2011 by hgpu

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrigo Dominguez, David Kaeli

View

Download (PDF)

Tags: ATI, ATI Radeon HD 3870, ATI Stream, Brook, Code generation, Compilers, Computer science, Optimization

January 7, 2011 by hgpu

Granular visibility queries on the GPU

Thomas Engelhardt, Carsten Dachsbacher

View

Download (PDF)

Tags: 3D Graphics and Realism, ATI, ATI Radeon HD 3870, Computer science, nVidia, nVidia GeForce GTX 280, OpenGL, Rendering

November 25, 2010 by hgpu

Specx: Speculative task-based runtime system

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

exa-AMD: Exascale Accelerated Materials Discovery

Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

Architecture-Aware Optimization Targeting Multithreaded Stream Computing

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Granular visibility queries on the GPU

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)