high performance computing on graphics processing units: hgpu.org

Posts

Oct, 15

GPU fluids in production: a compiler approach to parallelism

Fluid effects in films require the utmost flexibility, from manipulating a small lick of flame to art-directing a huge tidal wave. While fluid solvers are increasingly making use of GPU hardware, one of the biggest challenges is taking advantage of this technology without compromising on either adaptability or performance. We developed the Jet toolset comprised […]

Oct, 15

Accelerating code on multi-cores with FastFlow

FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code […]

Oct, 15

Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards

In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, it is hard to efficiently […]

CUDA

Oct, 14

An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming

Data parallel programming provides an accessible model for exploiting the power of parallel computing elements without resorting to the explicit use of low level programming techniques based on locks, threads and monitors. The emergence of Graphics Processing Units (GPUs) with hundreds or thousands of processing cores has made data parallel computing available to a wider […]

CUDA

Oct, 14

Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems

General-purpose graphical processing units (GPGPUs) have transformed high-performance computing over the past decade. Making great computational power available with reduced cost and power consumption overheads, heterogeneous CPU-GPU-equipped systems have helped to make possible the emerging class of exascale data-intensive applications. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of […]

CUDA

Oct, 14

CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization

As the computational power of GPUs continues to scale with Moore’s Law, an increasing number of applications are becoming limited by memory bandwidth. We propose an approach for programming GPUs with tightly-coupled specialized DMA warps for performing memory transfers between on-chip and off-chip memories. Separate DMA warps improve memory bandwidth utilization by better exploiting available […]

CUDA

Oct, 14

OptiML: An implicitly parallel domain-specific language for machine learning

As the size of datasets continues to grow, machine learning applications are becoming increasingly limited by the amount of available computational power. Taking advantage of modern hardware requires using multiple parallel programming models targeted at different devices (e.g. CPUs and GPUs). However, programming these devices to run efficiently and correctly is difficult, error-prone, and results […]

OpenCL

Oct, 14

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers

Heterogeneous computers with processors and accelerators are becoming widespread in scientific computing. However, it is difficult to program hybrid architectures and there is no commonly accepted programming model. Ideally, applications should be written in a way that is portable to many platforms, but providing this portability for general programs is a hard problem. By restricting […]

CUDA

Oct, 14

GPU Computing Gems: Jade Edition

This is the second volume of Morgan Kaufmann’s GPU Computing Gems, offering an all-new set of insights, ideas, and practical ";hands-on"; skills from researchers and developers worldwide. Each chapter gives you a window into the work being performed across a variety of application domains, and the opportunity to witness the impact of parallel GPU computing […]

CUDA

Oct, 14

Towards scalar synchronization in SIMT architectures

An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp or wavefront and executes this group of scalar threads in […]

CUDA

•

OpenCL

Oct, 14

A Heterogeneous Parallel Framework for Domain-Specific Languages

Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability. In previous work we proposed the use of domain-specific languages […]

OpenCL

Oct, 14

Fast Multipole Method vs. Spectral Method for the Simulation of Isotropic Turbulence on GPUs

This paper presents calculations of homogeneous isotropic turbulence at Re_{lambda} = 100 using both a pseudo-spectral method and a fast multipole vortex method on a 256^3 grid. For the vortex method, both algorithmic and hardware acceleration are applied using a highly parallel fast multipole method (FMM) on GPUs. The spectral methods uses the FFTW library […]

CUDA

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

GPU fluids in production: a compiler approach to parallelism

Accelerating code on multi-cores with FastFlow

Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards

An Analysis of Programmer Productivity versus Performance for High Level Data Parallel Programming

Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems

CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization

OptiML: An implicitly parallel domain-specific language for machine learning

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers

GPU Computing Gems: Jade Edition

Towards scalar synchronization in SIMT architectures

A Heterogeneous Parallel Framework for Domain-Specific Languages

Fast Multipole Method vs. Spectral Method for the Simulation of Isotropic Turbulence on GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)