high performance computing on graphics processing units: hgpu.org

Posts

Dec, 25

Fast Approximation of High-Order Voronoi Diagrams and Distance Transforms on the GPU

We present a graphics hardware implementation of the tangent-plane algorithm for computing the kth-order Voronoi diagram of a set of point sites in image space. Correct and efficient implementation of this algorithm using graphics hardware is possible only with the use of an appropriate shader program on the GPU. This is achieved by rendering in […]

OpenGL

Dec, 25

Fourier Volume Rendering on the GPU Using a Split-Stream-FFT

The Fourier volume rendering technique operates in the frequency domain and creates line integral projections of a 3D scalar field. These projections can be efficiently generated in O(N^2 log N) by utilizing the Fourier Slice-Projection theorem. However, until now, the mathematical difficulty of the Fast Fourier Transform prevented acceleration by graphics hardware and therefore limited […]

Dec, 24

Ray Casting Deformable Models on the GPU

The GPUs pack high computation power and a restricted architecture into easily available hardware today. They are now used as computation co-processors and come with programming models that treat them as standard parallel architectures. We explore the problem of realtime ray casting of large deformable models (over a million triangles) on large displays (a million […]

CUDA

Dec, 24

The GPU on biomedical image processing for color and phenotype analysis

The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications. In this paper, we exploit this power in the context of biomedical image processing by establishing a cooperative environment between the CPU and the GPU. We deal with phenotype and color analysis on a wide […]

OpenGL

Dec, 24

GPU algorithms for radiosity and subsurface scattering

We capitalize on recent advances in modern programmable graphics hardware, originally designed to support advanced local illumination models for shading, to instead perform two different kinds of global illumination models for light transport. We first use the new floating-point texture map formats to find matrix radiosity solutions for light transport in a diffuse environment, and […]

OpenGL

Dec, 24

A code motion technique for accelerating general-purpose computation on the GPU

Graphics processing units (GPUs) are providing increasingly higher performance with programmable internal processors, namely vertex processors (VPs) and fragment processors (FPs). Such newly added capabilities motivate us to perform general-purpose computation on GPUs (GPGPU) beyond graphics applications. Although VPs and FPs are connected in a pipeline, many GPGPU implementations utilize only FPs as a computational […]

OpenGL

Dec, 24

Information Visualization of Multi-dimensional Cellular Automata using GPU Programming

We propose a method for generating all possible rules of multi-dimension Boolean cellular automata (CA). Based on an original encoding method and the programming of graphical processor units (GPU), this method allows us to visualize the CA information flow in real-time so that emerging behaviors can be easily identified. Algorithms of first and von Neumann […]

OpenGL

Dec, 24

Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture

This work presents an implementation of neocognitron neural network, using a high performance computing architecture based on GPU (graphics processing unit). Neocognitron is an artificial neural network, proposed by Fukushima and collaborators, constituted of several hierarchical stages of neuron layers, organized in two-dimensional matrices called cellular planes. For the high performance computation of face recognition […]

CUDA

Dec, 24

Singular value decomposition on GPU using CUDA

Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high performance co-processors due to their tremendous computing power. In this paper, we present the implementation of singular value decomposition (SVD) of a dense matrix on GPU using the CUDA programming […]

Dec, 24

A high-speed multi-GPU implementation of bottom-up attention using CUDA

In this paper a novel implementation of the saliency map model on a multi-GPU platform using CUDA technology is presented. The saliency map model is a well-known computational model for bottom-up attention selection and serves as a basis of many attention control strategies of cognitive vision systems. A real-time implementation is the prerequisite of an […]

CUDA

Dec, 24

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

We are currently faced with the situation where applications have increasing computational demands and there is a wide selection of parallel processor systems. In this paper we focus on exploiting fine-grain parallelism for a demanding bioinformatics application – MrBayes – and its phylogenetic likelihood functions (PLF) using different architectures. Our experiments compare side-by-side the scalability […]

CUDA

Dec, 24

GPU Based Real-Time Instrument Tracking with Three Dimensional Ultrasound

Real-time three-dimensional ultrasound enables new intracardiac surgical procedures, but the distorted appearance of instruments in ultrasound poses a challenge to surgeons. This paper presents a detection technique that identifies the position of the instrument within the ultrasound volume. The algorithm uses a form of the generalized Radon transform to search for long straight objects in […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast Approximation of High-Order Voronoi Diagrams and Distance Transforms on the GPU

Fourier Volume Rendering on the GPU Using a Split-Stream-FFT

Ray Casting Deformable Models on the GPU

The GPU on biomedical image processing for color and phenotype analysis

GPU algorithms for radiosity and subsurface scattering

A code motion technique for accelerating general-purpose computation on the GPU

Information Visualization of Multi-dimensional Cellular Automata using GPU Programming

Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture

Singular value decomposition on GPU using CUDA

A high-speed multi-GPU implementation of bottom-up attention using CUDA

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

GPU Based Real-Time Instrument Tracking with Three Dimensional Ultrasound

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)