high performance computing on graphics processing units: hgpu.org

Posts

Dec, 6

A Duality Based Approach for Realtime TV-L1 Optical Flow

Variational methods are among the most successful approaches to calculate the optical flow between two image frames. A particularly appealing formulation is based on total variation (TV) regularization and the robust L1 norm in the data fidelity term. This formulation can preserve discontinuities in the flow field and offers an increased robustness against illumination changes, […]

OpenGL

Dec, 6

Real Time Capture of Audio Images and their Use with Video

Spherical microphone arrays provide an ability to compute the acoustical intensity corresponding to different spatial directions in a given frame of audio-data. These intensities may be exhibited as an image and these images updated at a high frame rate to achieve a video stream if the data capture and intensity computations can be performed sufficiently […]

CUDA

Dec, 5

Using Graphics Hardware for Enhancing Edge and Circle Detection

A broad family of problems in computer vision and image analysis require edge and circle detection. This paper explores the properties of the Hough transform for such tasks, improving them under a novel implementation on commodity graphics hardware. We demonstrate both a faster execution and a more reliable detection under different scenarios and a range […]

OpenGL

Dec, 5

Optical Flow Computation on Compute Unified Device Architecture

In this study, the implementation of an image processing technique on Compute Unified Device Architecture (CUDA) is discussed. CUDA is a new hardware and software architecture developed by NVIDIA Corporation for the generalpurpose computation on graphics processing units. CUDA features an on-chip shared memory with very fast general read and write access, which enables threads […]

CUDA

Dec, 5

GPU architecture overview

Abstract not available

Dec, 5

Visibility Sampling on GPU and Applications

In this paper, we show how recent GPUs can be used to very efficiently and conveniently sample the visibility between two surfaces, given a set of occluding triangles. We use bitwise arithmetics to evaluate, encode, and combine the samples blocked by each triangle. In particular, the number of operations is almost independent of the number […]

Dec, 5

Simulation and interaction of fluid dynamics

In the fluid simulation, the fluids and their surroundings may greatly change properties such as shape and temperature simultaneously, and different surroundings would characterize different interactions, which would change the shape and motion of the fluids in different ways. On the other hand, interactions among fluid mixtures of different kinds would generate more comprehensive behavior. […]

Dec, 5

GPU physics

Abstract not available

CUDA

•

OpenGL

Dec, 5

Algorithmic Differentiation: Application to Variational Problems in Computer Vision

Many vision problems can be formulated as minimization of appropriate energy functionals. These energy functionals are usually minimized, based on the calculus of variations (Euler-Lagrange equation). Once the Euler-Lagrange equation has been determined, it needs to be discretized in order to implement it on a digital computer. This is not a trivial task and, is […]

Dec, 5

Fast continuous collision detection among deformable models using graphics processors

We present an interactive algorithm to perform continuous collision detection between general deformable models using graphics processors (GPUs). We model the motion of each object in the environment as a continuous path and check for collisions along the paths. Our algorithm precomputes the chromatic decomposition for each object and uses visibility queries on GPUs to […]

Dec, 5

Cache-efficient numerical algorithms using graphics hardware

We present cache-efficient algorithms for scientific computations using graphics processing units (GPUs). Our approach is based on mapping the nested loops in the numerical algorithms to the texture mapping hardware and efficiently utilizing GPU caches. This mapping exploits the inherent parallelism, pipelining and high memory bandwidth on GPUs. We further improve the performance of numerical […]

OpenGL

Dec, 5

Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA)

Recently, we are attending to a huge evolution on the development of high performance computing platforms. Among these platforms, the GPU (Graphics Processing Units) stimulated by game industries, constantly demanding more graphical processing power, evolved from a simple graphical card to a general purpose computation parallel data processing device. This article shows the GPU’s viability […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Duality Based Approach for Realtime TV-L1 Optical Flow

Real Time Capture of Audio Images and their Use with Video

Using Graphics Hardware for Enhancing Edge and Circle Detection

Optical Flow Computation on Compute Unified Device Architecture

GPU architecture overview

Visibility Sampling on GPU and Applications

Simulation and interaction of fluid dynamics

GPU physics

Algorithmic Differentiation: Application to Variational Problems in Computer Vision

Fast continuous collision detection among deformable models using graphics processors

Cache-efficient numerical algorithms using graphics hardware

Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)