high performance computing on graphics processing units: hgpu.org

Posts

May, 17

22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014

Special Session on GPU computing The Special Session on GPU Computing and Hybrid Computing aims at providing a forum for scientific researchers and engineers on hot topics related to GPU computing and hybrid computing with special emphasis on applications, performance analysis, programming models and mechanisms for mapping codes. Topics of interest include, but are not […]

May, 16

Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs

We present a method for estimating the point spread function (PSF) of solar surface images acquired from ground telescopes and degraded by atmosphere. The estimation is done by retrieving the wavefront phase using a set of short exposures, the speckle reconstruction of the observed object and a PSF model parametrized by Zernike polynomials. Estimates of […]

OpenCL

May, 16

Extended Data Collection: Analysis of Cache Behavior and Performance of Different BVH Memory Layouts for Tracing Incoherent Rays

With CPUs moving towards many-core architectures and GPUs becoming more general purpose architectures, path tracing can now be well parallelized on commodity hardware. While parallelization is trivial in theory, properties of real hardware make efficient parallelization difficult, especially when tracing incoherent rays. We investigate how different bounding volume hierarchy (BVH) and node memory layouts as […]

CUDA

May, 16

Analyzing Locality of Memory References in GPU Architectures

In this paper we advocate formal locality analysis on memory references of GPGPU kernels. We investigate the locality of reference at different cache levels in the memory hierarchy. At the L1 cache level, we look into the locality behavior at the warp-, the thread block- and the streaming multiprocessor-level. Using matrix multiplication as a case […]

May, 16

Stabilized Backward Diffusion for Partial Volume Correction

This paper proposes a novel algorithm for correcting the Partial Volume Effect in Positron Emission Tomography (PET) images, using registered Computed Tomography (CT) data to enhance the blurred PET image. The algorithm is based on a forward-and-backward anisotropic heat equation solver that deblurs the PET image along CT gradients. A forward diffusion force is also […]

OpenCL

May, 16

Gauge fixing in lattice QCD with multi-GPUs

Here we present the cuLGT code for gauge fixing in lattice gauge field theories with graphic processing units (GPUs). Implementations for SU(3) Coulomb, Landau and maximally Abelian gauge fixing are available and the overrelaxation, stochastic relaxation and simulated annealing algorithms are supported. Performance results for single and multi-GPUs are given.

CUDA

May, 15

CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code

The FEM has proven to be one of the most efficient methods for solving differential equations. Designed to run on different computer architectures, technological improvements have led over the years to the fast solution of larger and larger problems. Among these technological improvements, we emphasize the development of GPU (Graphic Processor Unit). Scientific programming in […]

CUDA

May, 15

Perturbation Functions in Computer Graphics

The problem of real-time photorealistic imaging is discussed. New techniques for specifying free forms without their approximation by polygons are considered. Free forms based on the perturbation functions have an advantage of spline representation of surfaces, that is, a high degree of smoothness, and an advantage of arbitrary form for a small number of perturbation […]

CUDA

May, 15

The Lattice Boltzmann Equation Method for Complex Flows

The lattice Boltzmann equation (LBE) method is a promising technique for simulating fluid flows and modeling complex physics. Because the LBE model is based on microscopic models and mesoscopic kinetic equations, it offers many advantages for the study of multi-component or multiphase flows. However, there are still challenges encountered when dealing with thermal effects and […]

May, 15

Approximative inference for multivariate functional data on massively parallel processors

With continually increasing data sizes, the relevance of the big n problem of classical likelihood approaches is greater than ever. This paper considers functional data, and presents operator approximations, where observations are embedded in function space, and likelihood calculations are carried out in the functional domain. The resulting approximated problems are naturally parallel and can […]

CUDA

May, 15

Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms

Fractal compression is an efficient technique for image and video encoding that uses the concept of self-referential codes. Although offering compression quality that matches or exceeds traditional techniques with a simpler and faster decoding process, fractal techniques have not gained widespread acceptance due to the computationally intensive nature of its encoding algorithm. In this paper, […]

OpenCL

May, 13

Interaction and Visualization Techniques for Immersive Exploration and Perception of 3D datasets

The objective in this case is not only to be realistic, but also to provide new and intelligible ways of model representation. This raises new issues in data perception. The question of perception of complex data, especially regarding visual feedback, is an open question, and it is the subject of this work. This PhD thesis […]

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014

Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs

Extended Data Collection: Analysis of Cache Behavior and Performance of Different BVH Memory Layouts for Tracing Incoherent Rays

Analyzing Locality of Memory References in GPU Architectures

Stabilized Backward Diffusion for Partial Volume Correction

Gauge fixing in lattice QCD with multi-GPUs

CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code

Perturbation Functions in Computer Graphics

The Lattice Boltzmann Equation Method for Complex Flows

Approximative inference for multivariate functional data on massively parallel processors

Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms

Interaction and Visualization Techniques for Immersive Exploration and Perception of 3D datasets

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)