high performance computing on graphics processing units: hgpu.org

Posts

Jan, 31

Studies Concerning the ATLAS IBL Calibration Architecture

With the commissioning of the Insertable B-Layer (IBL) in 2013 at the ATLAS experiment 12~million additional pixels will be added to the current Pixel Detector. While the idea of employing pairs of VME based Read-Out Driver (ROD) and Back of Crate (BOC) cards in the read-out chain remains unchanged, modifications regarding the IBL calibration procedure […]

CUDA

Jan, 31

OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in improving performance. A major cause of this […]

CUDA

Jan, 31

Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics

As part of a 3-wk intersession workshop funded by a National Science Foundation Expeditions in Computing award, 15 undergraduate students from the City University of New York1 collaborated on a study aimed at characterizing the voltage dynamics and arrhythmogenic behavior of cardiac cells for a broad range of physiologically relevant conditions using an in silico […]

CUDA

Jan, 31

The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics

Dislocations play an important role in the deformation behaviors of metals. They not only interact via long-range elastic stress, but also interact with shortrange interactions; they annihilate, tangle, get stuck, and unstuck. These interaction between dislocations lead to interesting dislocation wall formation at the mesoscales. A recently developed continuum dislocation dynamics model that shows dislocation […]

CUDA

Jan, 31

Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs

The SIMD (Single Instruction-Multiple Data) execution model of Graphics Processing Units (GPUs) allows for many concurrent threads to simultaneously request data from the memory subsystem. This imposes a large bandwidth demand on the memory interfaces at each level. Each level of the memory hierarchy needs to provide enough bandwidth in order to ensure good response […]

CUDA

Jan, 31

GPU Enhanced Stream-Based Matrix Multiplication

The paper introduces an algorithm which improves the value of the real giga floating point operations per second (GFLOPS) for matrix multiplication algorithm on Graphical Process Unit-GPU by overlapping the data transfers between (CPU) and the device (GPU) with the kernel execution. The input matrices are divided into n sections and the output matrix into […]

CUDA

Jan, 31

Particle method on GPU

In this article we present a graphics processing unit (GPU) implementation of a particle method for transport equations. More precisely the numerical method under consideration is a remeshed particle method. Not only remeshing particles makes simulations more accurate in flows with strong strain, but it leads to algorithms more regular in term of data structures. […]

OpenCL

Jan, 30

Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework

Inference of protein structure from experimental data is of crucial interest in science, medicine and biotechnology. Low-resolution methods, such as small angle X-ray scattering (SAXS), play a major role in investigating important biological questions regarding the structure of proteins in solution. To infer protein structure from SAXS data, it is necessary to calculate the expected […]

OpenCL

Jan, 30

Faster Algorithms for RNA-folding using the Four-Russians method

The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n^3) time using dynamic programming. Four-Russians is a technique that will reduce the running time for certain dynamic programming algorithms by a factor after a preprocessing step where solutions to all […]

CUDA

Jan, 30

Many-threaded Differential Evolution on the GPU

Differential evolution (DE) is an efficient populational meta-heuristic optimization algorithm that has been applied to many difficult real world problems. Due to the relative simplicity of its operations and real encoded data structures, it is very suitable for a parallel implementation on multicore systems and on the GPUs that nowadays reach peak performance of hundreds […]

CUDA

Jan, 30

Scheduling (ir)regular applications on heterogeneous platforms

Current computational platforms have become continuously more and more heterogeneous and parallel over the last years, as a consequence of incorporating accelerators whose architectures are parallel and different from the CPU. As a result, several frameworks were developed to aid to program these platforms mainly targeting better productivity ratios. In this context, GAMA framework is […]

CUDA

Jan, 29

GPUDet: A Deterministic GPU Architecture

Nondeterminism is a key challenge in developing multithreaded applications. Even with the same input, each execution of a multithreaded program may produce a different output. This behavior complicates debugging and limits one’s ability to test for correctness. This non-reproducibility situation is aggravated on massively parallel architectures like graphics processing units (GPUs) with thousands of concurrent […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Studies Concerning the ATLAS IBL Calibration Architecture

OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

Teaching cardiac electrophysiology modeling to undergraduate students: laboratory exercises and GPU programming for the study of arrhythmias and spiral wave dynamics

The Physics of Singular Dislocation Structures in Continuum Dislocation Dynamics

Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs

GPU Enhanced Stream-Based Matrix Multiplication

Particle method on GPU

Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework

Faster Algorithms for RNA-folding using the Four-Russians method

Many-threaded Differential Evolution on the GPU

Scheduling (ir)regular applications on heterogeneous platforms

GPUDet: A Deterministic GPU Architecture

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)