high performance computing on graphics processing units: hgpu.org

Posts

Nov, 26

PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package

We present a parallel/GPU implementation of our open-source reactive molecular dynamics code, PG-PuReMD (Parallel GPU-Purdue Reactive Molecular Dynamics). Using a variety of innovative algorithms and optimizations, PGPuReMD achieves over 350x speedup compared to a single CPU implementation on a cluster of 36 state of the art GPUs. This is a significant development, since it enables […]

CUDA

Nov, 25

Diagrammatic Determinantal Quantum Monte Carlo Calculations on GPUs

The Diagrammatic Determinantal Quantum Monte Carlo (DDQMC) algorithm [11, s. III] is used to solve quantum impurity models such as the Anderson model [13]. The topic of this dissertation is the efficient porting of an existing implementation of DDQMC to CUDA in order to use GPUs as accelerators. The main characteristics of quantum impurity models […]

CUDA

Nov, 25

Investigating the use of GPUs with a Monte Carlo Astrophysical Simulation

For a given simulation, the most expensive subroutine in the astrophysics code, MOCCA (MOnte Carlo Cluster SimulAtor), has been ported to run as a kernel on a GPU (Graphics Processing Unit). The code was accelerated using the CUDA programming model, which was performed with PGI CUDA Fortran. The GPU code was run with varying problem […]

CUDA

Nov, 25

Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

Monte Carlo simulations of the Ising model play an important role in the field of computational statistical physics, and they have revealed many properties of the model over the past few decades. However, the effect of frustration due to random disorder, in particular the possible spin glass phase, remains a crucial but poorly understood problem. […]

CUDA

Nov, 25

Potential Energy Landscapes for the 2D XY Model: Minima, Transition States and Pathways

We describe a numerical study of the potential energy landscape for the two-dimensional XY model (with no disorder), considering up to 100 spins and CPU and GPU implementations of local optimization, focusing on minima and saddles of index one (transition states). We examine both periodic and anti-periodic boundary conditions, and show that the number of […]

CUDA

Nov, 25

Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance

We have seen more and more interest in taking advantage of GPUs to accelerate simulations. However, the RNGs driving these simulations tend to be existing CPU generators that have been converted for use on GPUs. The result is a generator that does not efficiently utilise the resources and constraints of that architecture. Consequently, the performance […]

CUDA

Nov, 24

Optimising Monte Carlo option pricing using GPUs

Computer modelling has been used for a number of years already to aid financial institutions in making business decisions. One such decision that financial firms are often faced with involves setting fair prices for financial options. Since the process of option pricing can be computationally expensive, methods of optimising it are sought after. One popular […]

CUDA

Nov, 24

Fast approximate k-nearest neighbours search using GPGPU

The k-nearest neighbours (k-NN) search is one of the most critical nonparametric methods used in data retrieval and similarity tasks. Over recent years fast k-NN processing for large amount of high-dimensional data is increasingly demanded. Locality-sensitive hashing is a viable solution for computing fast approximate nearest neighbours (ANN) with reasonable accuracy. This chapter presents a […]

CUDA

Nov, 24

A parallel search tree algorithm for vertex cover on graphical processing units

Graphical Processing Units (GPUs) have become popular recently due their highly parallel shared-memory architectures. The computational challenge posed by NP-Hard problems makes them potential targets to GPU-based computations, especially when solved by exact exponential-time algorithms. Using the classical NP-hard Vertex Cover problem as a case study, we provide a framework for GPU-based solutions by exploiting […]

CUDA

Nov, 24

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

The increasing incorporation of Graphics Processing Units (GPUs) as accelerators has been one of the forefront High Performance Computing (HPC) trends and provides unprecedented performance; however, the prevalent adoption of the Single-Program Multiple-Data (SPMD) programming model brings with it challenges of resource underutilization. In other words, under SPMD, every CPU needs GPU capability available to […]

CUDA

Nov, 24

Real-time Building Airflow Simulation Aided by GPU and FFD

Two recent methods for the fast simulation of the building airflow are studied: the fast fluid dynamics (FFD) algorithm and the use of graphic processing unit (GPU) for scientific computing in building engineering. A GOOGLE SketchUp plug-in for the FFD program was also developed as a model-creating tool to enhance the accessibility of the operation […]

CUDA

Nov, 23

LoGV: Low-overhead GPGPU Virtualization

Over the last few years, running high performance computing applications in the cloud has become feasible. At the same time, GPGPUs are delivering unprecedented performance for HPC applications. Cloud providers thus face the challenge to integrate GPGPUs into their virtualized platforms, which has proven difficult for current virtualization stacks. In this paper, we present LoGV, […]

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

Advances in Semantic Patching for HPC-oriented Refactorings with Coccinelle

DuoReduce: MLIR's benchmark

Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package

Diagrammatic Determinantal Quantum Monte Carlo Calculations on GPUs

Investigating the use of GPUs with a Monte Carlo Astrophysical Simulation

Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

Potential Energy Landscapes for the 2D XY Model: Minima, Transition States and Pathways

Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance

Optimising Monte Carlo option pricing using GPUs

Fast approximate k-nearest neighbours search using GPGPU

A parallel search tree algorithm for vertex cover on graphical processing units

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Real-time Building Airflow Simulation Aided by GPU and FFD

LoGV: Low-overhead GPGPU Virtualization

Recent source codes

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Most viewed papers (last 30 days)