high performance computing on graphics processing units: hgpu.org

Posts

Oct, 26

Artifact-Free Decompression and Zooming of JPEG Compressed Images with Total Generalized Variation

We propose a new model for the improved reconstruction and zooming of JPEG (Joint Photographic Experts Group) images. In the reconstruction process, given a JPEG compressed image, our method first determines the set of possible source images and then specifically chooses one of these source images satisfying additional regularity properties. This is realized by employing […]

CUDA

Oct, 26

Finite Pointset Method for 2D Dam-Break Problem with GPU-Acceleration

A Lagrangian particle scheme is applied to the projection method for the incompressible Navier-Stokes equations. The approximation of spatial derivatives is obtained by a computationally expensive Finite Pointset method. GPU computations are applied to improve the computational speed-up. The numerical solutions are obtained for the broken dam problem and are compared with the analytical solutions. […]

CUDA

Oct, 26

BrainCove: A Tool for Voxel-wise fMRI Brain Connectivity Visualization

Functional brain connectivity from fMRI studies has become an important tool in studying functional interactions in the human brain as a complex network. Most recently, research has started focusing on whole brain functional networks at the voxel-level, where fMRI time-signals at each voxel are correlated with every other voxel in the brain to determine their […]

OpenCL

•

OpenGL

Oct, 26

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

Graphics processing units (GPUs) offer significant speedups over CPUs for certain classes of applications. However, maximizing GPU performance can be a difficult task due to the relatively high programming complexity as well as frequent hardware changes. Important performance optimizations are applied by the GPU compiler ahead of time and require fixed parameter values at compile […]

CUDA

Oct, 26

An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

Homogeneous multicore processors, heterogeneous multicore processors, high performance accelerators, and other heterogeneous architectures have significant computing potential over traditional single core processors. Computer systems comprised of these specialized processing elements are increasingly common. Due to the increased complexity of these architectures, programming for them has become increasingly complex and error prone. Each of these architectures […]

CUDA

•

OpenCL

Oct, 26

Parallel Verlet neighbor list algorithm for GPU-optimized MD simulations

Understanding protein and RNA biomolecular folding and assembly processes have important applications because misfolding is associated with diseases like Alzheimer’s and Parkinson’s. However, simulating biologically relevant biomolecules on timescales that correspond to biological functions is an extraordinary challenge due to bottlenecks that are mainly involved in force calculations. We briefly review the molecular dynamics (MD) […]

Oct, 25

Automatic Generation Of Application-Specific Accelerators for FPGAs from Python Loop Nests

We present Three Fingered Jack, a highly productive approach to mapping vectorizable applications to the FPGA. Our system applies traditional dependence analysis and reordering transformations to a restricted set of Python loop nests. It does this to uncover parallelism and divide computation between multiple parallel processing elements (PEs) that are automatically generated through high-level synthesis […]

Oct, 25

GPU-Based Asynchronous Global Optimization with Particle Swarm

The recent upsurge in research into general-purpose applications for graphics processing units (GPUs) has made low cost high-performance computing increasingly more accessible. Many global optimization algorithms that have previously benefited from parallel computation are now poised to take advantage of general-purpose GPU computing as well. In this paper, a global parallel asynchronous particle swarm optimization […]

CUDA

Oct, 25

Modular & Scalable Ultrasound Platform with GPU Processing

The objective of our project is to develop a complete ultrasound platform with real-time GPU processing. The platform is designed to be modular and scalable both in number of ultrasound channels (64-256), as well as in communication bandwidth and processing power. By standardizing on the PCIe switch fabric, we are planning to integrate all the […]

CUDA

Oct, 25

A structural analysis of the A5/1 state transition graph

We describe efficient algorithms to analyze the cycle structure of the graph induced by the state transition function of the A5/1 stream cipher used in GSM mobile phones and report on the results of the implementation. The analysis is performed in five steps utilizing HPC clusters, GPGPU and external memory computation. A great reduction of […]

CUDA

Oct, 25

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

We consider the problem of computing reachability probabilities: given a Markov chain, an initial state of the Markov chain, and a set of goal states of the Markov chain, what is the probability of reaching any of the goal states from the initial state? This problem can be reduced to solving a linear equation Ax=b […]

CUDA

Oct, 24

GPU Implementation of the STA Algorithm on I/Q Data

GPU computing is a new paradigm in high performance signal and image processing. Massive parallel processing offered by the GPUs provides high acceleration of computations when they are properly implemented. Ultrasound image reconstruction is one of these highly parallel classes of algorithms. Massive amount of multichannel input data and deterministic order of execution makes US […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Artifact-Free Decompression and Zooming of JPEG Compressed Images with Total Generalized Variation

Finite Pointset Method for 2D Dam-Break Problem with GPU-Acceleration

BrainCove: A Tool for Voxel-wise fMRI Brain Connectivity Visualization

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

Parallel Verlet neighbor list algorithm for GPU-optimized MD simulations

Automatic Generation Of Application-Specific Accelerators for FPGAs from Python Loop Nests

GPU-Based Asynchronous Global Optimization with Particle Swarm

Modular & Scalable Ultrasound Platform with GPU Processing

A structural analysis of the A5/1 state transition graph

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

GPU Implementation of the STA Algorithm on I/Q Data

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)