high performance computing on graphics processing units: hgpu.org

Posts

Feb, 1

Embedding OpenCL in GHC Haskell

OpenCL defines a computation model for data-parallel code, supporting compilation to a variety of platforms, including both conventional x86 CPUs and commodity graphics hardware. OpenCL consists of both a programming language for writing data parallel code, called kernels, and an API, written in C, for interacting with the OpenCL platform and invoking OpenCL kernels. We […]

OpenCL

Feb, 1

Efficient Exploitation of Heterogeneous Platforms for Vertebra Detection in X-Ray Images

Back problems are often related to an abnormal condition of the spine. In this context, conventional X-Ray radiography is the most common modality used in emergency rooms since it is relatively inexpensive and fast. In this paper, we are interested in a method for detecting and extracting vertebrae on X-Ray images. In a medical context, […]

CUDA

Jan, 31

Validation of the PyGBe code for Poisson-Boltzmann equation with boundary element methods

The PyGBe code solves the linearized Poisson-Boltzmann equation using a boundary-integral formulation. We use a boundary element method with a collocation approach, and solve it via a Krylov-subspace method. To do this efficiently, the matrix-vector multiplications in the Krylov iterations are accelerated with a treecode, achieving O(N log N) complexity. The code presents a Python […]

CUDA

Jan, 31

Studies Concerning the ATLAS IBL Calibration Architecture

With the commissioning of the Insertable B-Layer (IBL) in 2013 at the ATLAS experiment 12~million additional pixels will be added to the current Pixel Detector. While the idea of employing pairs of VME based Read-Out Driver (ROD) and Back of Crate (BOC) cards in the read-out chain remains unchanged, modifications regarding the IBL calibration procedure […]

CUDA

Jan, 31

OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in improving performance. A major cause of this […]

CUDA

Jan, 31

Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics

As part of a 3-wk intersession workshop funded by a National Science Foundation Expeditions in Computing award, 15 undergraduate students from the City University of New York1 collaborated on a study aimed at characterizing the voltage dynamics and arrhythmogenic behavior of cardiac cells for a broad range of physiologically relevant conditions using an in silico […]

CUDA

Jan, 31

The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics

Dislocations play an important role in the deformation behaviors of metals. They not only interact via long-range elastic stress, but also interact with shortrange interactions; they annihilate, tangle, get stuck, and unstuck. These interaction between dislocations lead to interesting dislocation wall formation at the mesoscales. A recently developed continuum dislocation dynamics model that shows dislocation […]

CUDA

Jan, 31

Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs

The SIMD (Single Instruction-Multiple Data) execution model of Graphics Processing Units (GPUs) allows for many concurrent threads to simultaneously request data from the memory subsystem. This imposes a large bandwidth demand on the memory interfaces at each level. Each level of the memory hierarchy needs to provide enough bandwidth in order to ensure good response […]

CUDA

Jan, 31

GPU Enhanced Stream-Based Matrix Multiplication

The paper introduces an algorithm which improves the value of the real giga floating point operations per second (GFLOPS) for matrix multiplication algorithm on Graphical Process Unit-GPU by overlapping the data transfers between (CPU) and the device (GPU) with the kernel execution. The input matrices are divided into n sections and the output matrix into […]

CUDA

Jan, 31

Particle method on GPU

In this article we present a graphics processing unit (GPU) implementation of a particle method for transport equations. More precisely the numerical method under consideration is a remeshed particle method. Not only remeshing particles makes simulations more accurate in flows with strong strain, but it leads to algorithms more regular in term of data structures. […]

OpenCL

Jan, 30

Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework

Inference of protein structure from experimental data is of crucial interest in science, medicine and biotechnology. Low-resolution methods, such as small angle X-ray scattering (SAXS), play a major role in investigating important biological questions regarding the structure of proteins in solution. To infer protein structure from SAXS data, it is necessary to calculate the expected […]

OpenCL

Jan, 30

Faster Algorithms for RNA-folding using the Four-Russians method

The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n^3) time using dynamic programming. Four-Russians is a technique that will reduce the running time for certain dynamic programming algorithms by a factor after a preprocessing step where solutions to all […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Embedding OpenCL in GHC Haskell

Efficient Exploitation of Heterogeneous Platforms for Vertebra Detection in X-Ray Images

Validation of the PyGBe code for Poisson-Boltzmann equation with boundary element methods

Studies Concerning the ATLAS IBL Calibration Architecture

OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics

The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics

Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs

GPU Enhanced Stream-Based Matrix Multiplication

Particle method on GPU

Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework

Faster Algorithms for RNA-folding using the Four-Russians method

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)