high performance computing on graphics processing units: hgpu.org

Posts

Nov, 5

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units

GPU-accelerated computing of the Boltzmann collision integral is studied using deterministic method with piecewise approximation of the velocity distribution function and analytical integration over collision impact parameters. The acceleration of 40 times is achieved compared to CPU calculations for a 3D problem of collisional relaxation of bi-Maxwellian velocity distribution.

CUDA

Nov, 5

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support

We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, […]

CUDA

Nov, 5

Challenges for compiler support for exascale computing

The compiler is central to the translation of the software we want users to write to the machine code we want to run. The scale of the applications and the choices of programming languages by users greatly complicate the role for the compiler and its analysis. The languages we use frequently don’t support rich optimizations […]

CUDA

Nov, 5

CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking

Diffusion Tensor Imaging (DTI) allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU). This algorithm, […]

CUDA

Nov, 5

Implementation of a multigrid solver on GPU for Stokes equations with strongly variable viscosity based on Matlab and CUDA

Stokes equations have been used in numerical simulations of geodynamic processes such as mantle convection , lithospheric deformation and lava flow, etc. In order to implement a solver for these equations, multigrid method is introduced to our solve. Multigrid method is commonly used in reducing the iteration steps for solving the elliptic partial differential equation […]

CUDA

Nov, 5

A mobile robot navigation with use of CUDA parallel architecture

In this article we present a navigation system of a mobile robot based on parallel calculations. It is assumed that the robot is equipped with a 3D laser range scanner. The system is essentially based on a dual grid-object, where labels are attached to detected objects (such maps can be used in navigation based on […]

CUDA

Nov, 5

Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices

CUDASW++ is a parallelization of the Smith-Waterman algorithm for CUDA graphical processing units that computes the similarity scores of a query sequence paired with each sequence in a database. The algorithm uses one of two kernel functions to compute the score between a given pair of sequences: the inter-task kernel or the intra-task kernel. We […]

CUDA

Nov, 5

Power Flow Analysis on CUDA-based GPU

This major qualifying project investigates the algorithm and the performance of using the CUDA-based Graphics Processing Unit for power flow analysis. The accomplished work includes the design, implementation and testing of the power flow solver. Comprehensive analysis shows that the execution time of the parallel algorithm outperforms that of the sequential algorithm by several factors.

CUDA

Nov, 5

Real-time Flame Rendering with GPU and CUDA

This paper proposes a method of flame simulation based on Lagrange process and chemical composition, which was non-grid and the problems associated with there grids were overcome. The turbulence movement of flame was described by Lagrange process and chemical composition was added into flame simulation which increased the authenticity of flame. For real-time applications, this […]

CUDA

•

OpenGL

Nov, 4

Accelerating a TV based JPEG decompression algorithm with Cuda

In previous works, we have have developed a mathematical model for artifact-free decompression of JPEG images. There, the problem of finding an artifact-free decompression for a given JPEG compressed image is related to a convex minimization problem. We use a primal-dual algorithm to solve this problem, for which we have developed a Matlab and C++ […]

CUDA

Nov, 4

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

For the blind separation of convolutive mixtures, a huge processing power is required. In this paper we propose a massive parallel implementation of the Independent Component Analysis in the time-frequency domain using the processing power of the current graphics adapters within the CUDA framework. The often used approach for solving the separation task is the […]

CUDA

Nov, 4

Parallelization of maximum likelihood fits with OpenMP and CUDA

Data analyses based on maximum likelihood fits are commonly used in the high energy physics community for fitting statistical models to data samples. This technique requires the numerical minimization of the negative log-likelihood function. MINUIT is the most common package used for this purpose in the high energy physics community. The main algorithm in this […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units

Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI multi GPU Backends with Subdomain Support

Challenges for compiler support for exascale computing

CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking

Implementation of a multigrid solver on GPU for Stokes equations with strongly variable viscosity based on Matlab and CUDA

A mobile robot navigation with use of CUDA parallel architecture

Improving CUDASW++, a Parallelization of Smith-Waterman for CUDA Enabled Devices

Power Flow Analysis on CUDA-based GPU

Real-time Flame Rendering with GPU and CUDA

Accelerating a TV based JPEG decompression algorithm with Cuda

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

Parallelization of maximum likelihood fits with OpenMP and CUDA

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)