high performance computing on graphics processing units: hgpu.org

Posts

Nov, 10

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on […]

OpenCL

Nov, 6

Computer Graphics: From Pixels to Programmable Graphics Hardware

Computer Graphics: From Pixels to Programmable Graphics Hardware explores all major areas of modern computer graphics, starting from basic mathematics and algorithms and concluding with OpenGL and real-time graphics. It gives students a firm foundation in today’s high-performance graphics. UP-TO-DATE TECHNIQUES, ALGORITHMS, AND API: The book includes mathematical background on vectors and matrices as well […]

CUDA

•

OpenCL

•

OpenGL

Oct, 29

Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linkopings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, […]

CUDA

•

OpenCL

Oct, 27

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

In this thesis, a MPI parallelized LBM code for a Multi-GPU platform has been designed and implemented. The primary goal of the thesis is research on efficient and scalable Multi-GPU LBM code, which exploits advanced features of the modern GPUs, to adopt optimization techniques like overlapping of work and communication in heterogeneous CPU-GPU clusters. In […]

OpenCL

Oct, 25

Online Performance Projection for Clusters with Heterogeneous GPUs

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which […]

OpenCL

Oct, 21

Concurrent kernel execution on Graphic Processing Units

General Purpose Graphic Processing Unit (GPGPU) are now used in high performance computing (HPC) for their massively parallel computing aspect and capabilities. Those devices integrate hundreds of computing unit (computing core). Usually, such a level of parallelism is used to solve simulation problems (heat transfer, …) because of the numerical representation of simulated environment (matrices). […]

CUDA

•

OpenCL

Oct, 21

Energy Efficiency Studies of Mont Blanc Applications

In this thesis, the performance and energy efficiency of four different implementations of matrix multiplication, written in OmpSs and OpenCL, is tested and evaluated. The benchmarking is done using an Intel Ivy Bridge Core i7 3770K. The results are evaluated and discussed with regards to different optimization configurations, like vectorization and multi-threading. Energy measurements are […]

OpenCL

Oct, 18

OpenACC-based Snow Simulation

In recent years, the GPU platform has risen in popularity in high performance computing due to its cost effectiveness and high computing power offered through its many parallel cores. The GPUs computing power can be harnessed using the low-level GPGPU programming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the programmer fine-grained control […]

CUDA

•

OpenCL

Oct, 13

Contributions to parallel stochastic simulation: Application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte-Carlo simulations

The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all […]

CUDA

•

OpenCL

Oct, 10

A Parallel Intermediate Representation for Embedded Languages

This thesis presents a parallel intermediate representation for embedded languages called PIRE, and its incorporation into the Feldspar language. The original Feldspar backend translates the parallel loops of Feldspar to ordinary for loops, meaning that they are not actually parallel in the generated code. We create an alternate backend for the Feldspar project, where the […]

OpenCL

Oct, 5

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages […]

OpenCL

Oct, 5

Parametric GPU Code Generation for Affine Loop Programs

Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code

Computer Graphics: From Pixels to Programmable Graphics Hardware

Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

Online Performance Projection for Clusters with Heterogeneous GPUs

Concurrent kernel execution on Graphic Processing Units

Energy Efficiency Studies of Mont Blanc Applications

OpenACC-based Snow Simulation

Contributions to parallel stochastic simulation: Application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte-Carlo simulations

A Parallel Intermediate Representation for Embedded Languages

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

Parametric GPU Code Generation for Affine Loop Programs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)