high performance computing on graphics processing units: hgpu.org

Posts

May, 1

Application of GPUs for the Calculation of Two Point Correlation Functions in Cosmology

In this work, we have explored the advantages and drawbacks of using GPUs instead of CPUs in the calculation of a standard 2-point correlation function algorithm, which is useful for the analysis of Large Scale Structure of galaxies. Taking into account the huge volume of data foreseen in upcoming surveys, our main goal has been […]

CUDA

Apr, 28

Solving Stochastic Differential Equations Using General Purpose Graphics Processing Unit

Stochastic Differential Equations are important in many models of various physical or artificial phenomena. To get meaningful results it is desirable to solve the initial value numerical integration problem for a sufficiently large ensemble of realizations. Each element of the ensemble has the same form, thus exposing inherent data-parallelism. We implemented a cross-platform library written […]

CUDA

Apr, 28

Random Walks based Multi-Image Segmentation: Quasiconvexity Results and GPU-based Solutions

We recast the Cosegmentation problem using Random Walker (RW) segmentation as the core segmentation algorithm, rather than the traditional MRF approach adopted in the literature so far. Our formulation is similar to previous approaches in the sense that it also permits Cosegmentation constraints (which impose consistency between the extracted objects from >= 2 images) using […]

CUDA

Apr, 28

High-Performance Code Generation for Stencil Computations on GPU Architectures

Stencil computations arise in many scientific computing domains, and often represent time-critical portions of applications. There is significant interest in offloading these computations to high-performance devices such as GPU accelerators, but these architectures offer challenges for developers and compilers alike. Stencil computations in particular require careful attention to off-chip memory access and the balancing of […]

OpenCL

Apr, 28

The AGILE library for image reconstruction in biomedical sciences using graphics card hardware acceleration

Fast image reconstruction is a critical requirement for an imaging modality to be adopted in the field of clinical and pre-clinical sciences. While programs become faster due to more powerful hardware, at the same time data size increases and the need for advanced—and often computational more demanding—reconstruction algorithms arises. A cheap way to achieve a […]

CUDA

Apr, 28

Characterizing and Improving the Use of Demand-Fetched Caches in GPUs

Initially introduced as special-purpose accelerators for games and graphics code, graphics processing units (GPUs) have emerged as widely-used high-performance parallel computing platforms. GPUs traditionally provided only softwaremanaged local memories (or scratchpads) instead of demandfetched caches. Increasingly, however, GPUs are being used in broader application domains where memory access patterns are both harder to analyze and […]

CUDA

Apr, 27

Analytical Study of Various High Performance Computing Paradigms

Now-a-days various computing paradigms are present in IT industry. Cloud, Grid, Cluster and General Purpose-Graphical Processing Unit (GP-GPU) computing are High Performance Computing (HPC) technologies and are growing very quickly. These are undoubtedly today’s most enticing technology areas due to the various benefits offered by them such as virtualization, high performance and less managerial overhead […]

Apr, 27

Real-time particle simulation of fluids

Physically plausible simulation of fluids in real-time is mostly achieved using approximations of the NavierStokes equations. Recent methods simulate fluids by exploiting the capabilities of modern graphics processing units. This article describes a method called Smoothed Particle Hydrodynamics (SPH), which is a numerical approximation of the Navier-Stokes equations. The real-time simulation allows for interactivity which […]

CUDA

Apr, 27

Hybrid CPU/GPU KD-Tree Construction for Versatile Ray Tracing

We propose an hybrid CPU-GPU ray-tracing implementation based on an optimal Kd-Tree as acceleration structure. The construction and traversal of this KD-tree takes benefit from both the CPU and the GPU to achieve high-performance ray-tracing on mainstream hardware. Our approach, flexible enough to use only a single computing unit (CPU or GPU), is able to […]

CUDA

Apr, 27

The Case for Higher Computational Density in the Memory-Bound FDTD Method within Multicore Environments

It is argued here that more accurate though more compute-intensive alternate algorithms to certain computational methods which are deemed too inefficient and wasteful when implemented within serial codes can be more efficient and cost-effective when implemented in parallel codes designed to run on today’s multicore and many-core environments. This argument is most germane to methods […]

CUDA

Apr, 27

Matrix Multiplication with CUDA – A basic introduction to the CUDA programming model

We use the example of Matrix Multiplication to introduce the basics of GPU computing in the CUDA environment. It is assumed that the student is familiar with C programming, but no other background is assumed. The goal of this module is to show the student how to offload parallel computations to the graphics card, when […]

CUDA

Apr, 25

High-performance sparse matrix-vector multiplication on GPUs for structured grid computations

In this paper, we address efficient sparse matrix-vector multiplication for matrices arising from structured grid problems with high degrees of freedom at each grid node. Sparse matrix-vector multiplication is a critical step in the iterative solution of sparse linear systems of equations arising in the solution of partial differential equations using uniform grids for discretization. […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Application of GPUs for the Calculation of Two Point Correlation Functions in Cosmology

Solving Stochastic Differential Equations Using General Purpose Graphics Processing Unit

Random Walks based Multi-Image Segmentation: Quasiconvexity Results and GPU-based Solutions

High-Performance Code Generation for Stencil Computations on GPU Architectures

The AGILE library for image reconstruction in biomedical sciences using graphics card hardware acceleration

Characterizing and Improving the Use of Demand-Fetched Caches in GPUs

Analytical Study of Various High Performance Computing Paradigms

Real-time particle simulation of fluids

Hybrid CPU/GPU KD-Tree Construction for Versatile Ray Tracing

The Case for Higher Computational Density in the Memory-Bound FDTD Method within Multicore Environments

Matrix Multiplication with CUDA – A basic introduction to the CUDA programming model

High-performance sparse matrix-vector multiplication on GPUs for structured grid computations

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)