high performance computing on graphics processing units: hgpu.org

Posts

Dec, 26

Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms

Although iterative reconstruction techniques (IRTs) have been shown to produce images of superior quality over conventional filtered back projection (FBP) based algorithms, the use of IRT in a clinical setting has been hampered by the significant computational demands of these algorithms. In this paper we present results of our efforts to overcome this hurdle by […]

CUDA

Dec, 26

Spatio-temporal upsampling on the GPU

Pixel processing is becoming increasingly expensive for real-time applications due to the complexity of today’s shaders and high-resolution framebuffers. However, most shading results are spatially or temporally coherent, which allows for sparse sampling and reuse of neighboring pixel values. This paper proposes a simple framework for spatio-temporal upsampling on modern GPUs. In contrast to previous […]

OpenGL

Dec, 26

Automatic Pose Estimation for Range Images on the GPU

Object pose (location and orientation) estimation is a common task in many computer vision applications. Although many methods exist, most algorithms need manual initialization and lack robustness to illumination variation, appearance change, and partial occlusions. We propose a fast method for automatic pose estimation without manual initialization based on shape matching of a 3D model […]

Dec, 26

Parallel High Resolution Real-time Visual Hull On GPU

In this paper we present an efficient high resolution image based visual hull (IBVH) algorithm that entirely runs in real-time on a single consumer graphics card. The target application is a real-time 3D video conferencing system. One major contribution of this paper is a novel caching strategy for the reduction of line segment intersection tests. […]

CUDA

Dec, 26

Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems

The Local Outlier Factor (LOF) is a very powerful anomaly detection method available in machine learning and classification. The algorithm defines the notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned an LOF which represents […]

CUDA

Dec, 25

Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function

We investigate the numerical computation of the matrix sign function of large-scale dense matrices. This is a common task in various application areas. The main computational work in Newton’s iteration for the matrix sign function consits of matrix inversion. Therefore, we investigate the performance of two approaches for matrix inversion based on Gaussian (LU factorization) […]

CUDA

Dec, 25

Sketching MLS Image Deformations On the GPU

In this paper, we present an image editing tool that allows the user to deform images using a sketch-based interface. The user simply sketches a set of source curves in the input image, and also some target curves that the source curves should be deformed to. Then the moving least squares (MLS) deformation technique [SMW06] […]

CUDA

Dec, 25

Implementation of the Lucas-Kanade image registration algorithm on a GPU for 3D computational platform stabilisation

Image registration forms the basis of many computer vision tasks. The Lucas-Kanade image registration algorithm is known to efficiently solve the sub-problem of rigid image registration. It is therefore often used in image stabilisation applications. This paper presents the details of a real-time implementation of the Lucas-Kanade image registration algorithm on a Graphics Processing Unit […]

OpenGL

Dec, 25

Multi-domain, Higher Order Level Set Scheme for 3D Image Segmentation on the GPU

Level set method based segmentation provides an efficient tool for topological and geometrical shape handling. Conventional level set surfaces are only C^0 continuous since the level set evolution involves linear interpolation to compute derivatives. Bajaj et al. present a higher order method to evaluate level set surfaces that are C^2 continuous, but are slow due […]

CUDA

Dec, 25

A GPU Framework for the Visualization and On-the-Fly Amplification of Real Terrains

This paper describes a GPU framework for the real-time visualization of natural textured terrains, as well as the steps that are needed to populate them on-the-fly with tens of thousands of plant and/or mineral objects. Our main contribution is a robust modular architecture developed for the G80 and later GPUs, that performs texture/seed selection and […]

OpenGL

Dec, 25

A fast high quality pseudo random number generator for nVidia CUDA

Previously either due to hardware GPU limits or older versions of software, careful implementation of PRNGs was required to make good use of the limited numerical precision available on graphics cards. Newer nVidia G80 and Tesla hardware support double precision. This is available to high level programmers via CUDA. This allows a much simpler C++ […]

CUDA

Dec, 25

Simulating Photon Mapping for Real-time Applications

This paper introduces a novel method for simulating photon mapping for real-time applications. First we introduce a new method for selectively redistributing photons. Then we describe a method for selectively updating the indirect illumination. The indirect illumination is calculated using a new GPU accelerated final gathering method and the illumination is then stored in light […]

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms

Spatio-temporal upsampling on the GPU

Automatic Pose Estimation for Range Images on the GPU

Parallel High Resolution Real-time Visual Hull On GPU

Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems

Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function

Sketching MLS Image Deformations On the GPU

Implementation of the Lucas-Kanade image registration algorithm on a GPU for 3D computational platform stabilisation

Multi-domain, Higher Order Level Set Scheme for 3D Image Segmentation on the GPU

A GPU Framework for the Visualization and On-the-Fly Amplification of Real Terrains

A fast high quality pseudo random number generator for nVidia CUDA

Simulating Photon Mapping for Real-time Applications

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)