high performance computing on graphics processing units: hgpu.org

Posts

Jun, 8

Ameliorating Memory Contention of OLAP operators on GPU Processors

Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to […]

CUDA

Jun, 8

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units

The influence of multi-core central processing units and graphics processing units on several algebraic multigrid methods is investigated in this work. Different performance metrics traditionally employed for algebraic multigrid are reconsidered and reevaluated on these novel computing architectures. Our benchmark results show that with the use of graphics processing units for the solver phase, it […]

OpenCL

Jun, 8

Astrophysical Particle Simulations on Heterogeneous CPU-GPU Systems

A heterogeneous CPU-GPU node is getting popular in HPC clusters. We need to rethink algorithms and optimization techniques for such system depending on the relative performance of CPU vs. GPU. In this paper, we report a performance optimized particle simulation code "OTOO", that is based on the octree method, for heterogenous systems. Main applications of […]

OpenCL

Jun, 8

Parallel random variates generator for GPUs based on normal numbers

Pseudorandom number generators are required for many computational tasks, such as stochastic modelling and simulation. This paper investigates the serial CPU and parallel GPU implementation of a Linear Congruential Generator based on the binary representation of the normal number $alpha_{2,3}$. We adapted two methods of modular reduction which allowed us to perform most operations in […]

CUDA

Jun, 6

DMA-Assisted, Intranode Communication in GPU Accelerated Systems

Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In our previous work, we developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories. In this paper, we extend this […]

CUDA

Jun, 6

Classical Mechanical Hard-Core Particles Simulated in a Rigid Enclosure using Multi-GPU Systems

Hard-core interacting particle methods are of increasing importance for simulations and game applications as well as a tool supporting animations. We develop a high accuracy numerical integration technique for managing hard-core colliding particles of various physical properties such as differing interaction species and hard-core radii using multiple Graphical Processing Unit (m-GPU) computing techniques. We report […]

CUDA

Jun, 6

The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Computing Architectures

With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these accelerators. Traditionally, GPUs have connected to the CPU via the PCIe bus, which has proved to […]

OpenCL

Jun, 6

Relativistic Hydrodynamics on Graphic Cards

We show how to accelerate relativistic hydrodynamics simulations using graphic cards (graphic processing units, GPUs). These improvements are of highest relevance e.g. to the field of high-energetic nucleus-nucleus collisions at RHIC and LHC where (ideal and dissipative) relativistic hydrodynamics is used to calculate the evolution of hot and dense QCD matter. The results reported here […]

OpenCL

Jun, 6

Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose […]

CUDA

Jun, 5

European Seminar on Computing, ESCO 2012

ESCO 2012 is the 3rd event in a successful series of interdisciplineary meetings dedicated to modern methods and practices of scientific computing. Main thematic areas include: Multiphysics coupled problems, Higher-order computational methods, Computing with Python, GPU computing, and Cloud computing. Theoretical results as well as applications are welcome. Application areas include, but are not limited […]

Jun, 5

2nd International Conference on Information Management in the Knowledge Economy, IMKE – 2013

The International Conference on Information Management in the Knowledge Economy is a multidisciplinary Conference on digital information management, science and technology. The principal aim of this conference is to bring professionals in academia, research laboratories and industry together, and offer a collaborative platform to address the emerging issues and solutions in digital information science and […]

Jun, 5

Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013

ASPLOS is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and security, as well as applications and human-computer interaction. The importance of such crosscutting systems research has been growing hand in hand with the amount of parallelism in hardware, the scope of distribution in internet-scale […]

high performance computing on graphics processing units: hgpu.org

Posts

Ameliorating Memory Contention of OLAP operators on GPU Processors

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units

Astrophysical Particle Simulations on Heterogeneous CPU-GPU Systems

Parallel random variates generator for GPUs based on normal numbers

DMA-Assisted, Intranode Communication in GPU Accelerated Systems

Classical Mechanical Hard-Core Particles Simulated in a Rigid Enclosure using Multi-GPU Systems

The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Computing Architectures

Relativistic Hydrodynamics on Graphic Cards

Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

European Seminar on Computing, ESCO 2012

2nd International Conference on Information Management in the Knowledge Economy, IMKE – 2013

Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)