high performance computing on graphics processing units: hgpu.org

Posts

Jun, 25

Room acoustics modelling using GPU-accelerated finite difference and finite volume methods on a face-centered cubic grid

In this paper, a room acoustics simulation using a finite difference approximation on a face-centered cubic (FCC) grid with finite volume impedance boundary conditions is presented. The finite difference scheme is accelerated on an Nvidia Tesla K20 graphics processing unit (GPU) using the CUDA programming language. A performance comparison is made between 27-point finite difference […]

CUDA

Jun, 25

Differential Evolution with parallelised objective functions using CUDA

Differential Evolution (DE) algorithms can be used in various fields for problem solving where we need to find an optimal (or close to optimal) solution but we don’t have a clear, straightforward method to compute it. Unfortunately it can take a very long time to produce such a solution when implemented serially or even parallel […]

CUDA

Jun, 25

String Algorithm on GPGPU

Since the last decade, the concept of general purpose computing on graphics processors was introduced and has since garnered significant adaptation in the engineering industry. The use of a Graphics Processing Unit (GPU) as a many-core processing architecture for the purpose of general-purpose computation yields performance improvement of several orders-of magnitude. One example in leveraging […]

CUDA

Jun, 25

Parallelization of specialized fluid flow simulator based on lattice Boltzmann method on a multi GPU system

Computational demands of fluid flow simulations are high, with large computational resources required to perform the calculations and these applications have recently been accelerated with the help of GPU devices (Graphical Processing Units). Fluid flow simulation using discrete method called lattice Boltzmann (LB) has also been parallelized using GPU. In this paper a single-node multi-GPU […]

CUDA

Jun, 25

P-HGRMS: A Parallel Hypergraph Based Root Mean Square Algorithm for Image Denoising

This paper presents a parallel Salt and Pepper (SP) noise removal algorithm in a grey level digital image based on the Hypergraph Based Root Mean Square (HGRMS) approach. HGRMS is generic algorithm for identifying noisy pixels in any digital image using a two level hierarchical serial approach. However, for SP noise removal, we reduce this […]

CUDA

Jun, 24

GPU Implementation of the Particle Filter

This thesis work analyses the obstacles faced when adapting the particle filtering algorithm to run on massively parallel compute architectures. Graphics processing units are one example of massively parallel compute architectures which allow for the developer to distribute computational load over hundreds or thousands of processor cores. This thesis studies an implementation written for NVIDIA […]

CUDA

Jun, 24

Integrating Two-Way Interaction Between Fluids and Rigid Bodies in the Real-Time Particle Systems Library

In the last 15 years, Video games have become a dominate form of entertainment. The popularity of video games means children are spending more of their free time play video games. Usually, the time spent on homework or studying is decreased to allow for the extended time spent on video games. In an effort to […]

CUDA

Jun, 24

A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by […]

CUDA

Jun, 24

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due […]

CUDA

•

OpenCL

Jun, 24

Provably Efficient GPU Algorithms

In this paper we present an abstract model for algorithm design on GPUs by extending the parallel external memory (PEM) model with computations in internal memory (commonly known as shared memory in GPU literature) defined in the presence of memory banks and bank conflicts. We also present a framework for designing bank conflict free algorithms […]

CUDA

Jun, 23

The 22nd High Performance Computing Symposium, HPC 2014

The 2014 Spring Simulation Multiconference will feature the 22nd High Performance Computing Symposium (HPC 2014), devoted to the impact of high performance computing and communications on computer simulations. Advances in multicore and many-core architectures, networking, high end computers, large data stores, and middleware capabilities are ushering in a new era of high performance parallel and […]

Jun, 23

Workshop on GPU Programming for Molecular Modeling

The GPU Programming for Molecular Modeling workshop will extend GPU programming techniques to the field of molecular modeling, including subjects such as particle-grid algorithms (electrostatics, molecular surfaces, density maps, and molecular orbitals), particle-particle algorithms with an emphasis on non-bonded force calculations, radial distribution functions in GPU histogramming, single-node multi-GPU algorithms, and GPU clusters. Specific examples […]

high performance computing on graphics processing units: hgpu.org

Posts

Room acoustics modelling using GPU-accelerated finite difference and finite volume methods on a face-centered cubic grid

Differential Evolution with parallelised objective functions using CUDA

String Algorithm on GPGPU

Parallelization of specialized fluid flow simulator based on lattice Boltzmann method on a multi GPU system

P-HGRMS: A Parallel Hypergraph Based Root Mean Square Algorithm for Image Denoising

GPU Implementation of the Particle Filter

Integrating Two-Way Interaction Between Fluids and Rigid Bodies in the Real-Time Particle Systems Library

A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Provably Efficient GPU Algorithms

The 22nd High Performance Computing Symposium, HPC 2014

Workshop on GPU Programming for Molecular Modeling

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)