high performance computing on graphics processing units: hgpu.org

Posts

Feb, 6

Focused Volumetric Visual Hull with Color Extraction

This paper introduces a new approach for volumetric visual hull reconstruction, using a voxel grid that focuses on the moving target object. This grid is continuously updated as a function of object location, orientation, and size. The benefit is a reduced amount of voxels that have to be evaluated or allocated towards capturing the target […]

CUDA

Feb, 5

Regular Expression Matching on Graphics Hardware for Intrusion Detection

The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexible pattern matching functionality of regular expressions in these systems comes with significant overheads in terms of both memory and CPU cycles, since every byte of the inspected input needs to be […]

CUDA

Feb, 5

Interactive water streams with sphere scan conversion

Fluid simulations require efficient dynamics, surface extraction and rendering in order to achieve real time interaction. We present a novel technique for the surface extraction of stream-shaped fluid simulations represented as particles. Typical surface extraction methods for particles combine implicit function evaluation with the marching cubes algorithm. In our approach, we dynamically update vertex positions […]

OpenGL

Feb, 5

Software Pipelined Execution of Stream Programs on GPUs

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism […]

CUDA

Feb, 5

Stochastic transparency

Stochastic transparency provides a unified approach to order-independent transparency, anti-aliasing, and deep shadow maps. It augments screen-door transparency using a random sub-pixel stipple pattern, where each fragment of transparent geometry covers a random subset of pixel samples of size proportional to alpha. This results in correct alpha-blended colors on average, in a single render pass […]

OpenGL

Feb, 5

NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs

Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be […]

CUDA

Feb, 5

Model-driven autotuning of sparse matrix-vector multiply on GPUs

We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully hand-tuned SpMV implementations for GPUs, identifying key GPU-specific performance limitations, enhancements, and tuning opportunities. These implementations, which include variants on […]

CUDA

Feb, 5

Real-Time Face Pose Estimation from Single Range Images

We present a real-time algorithm to estimate the 3D pose of a previously unseen face from a single range image. Based on a novel shape signature to identify noses in range images, we generate candidates for their positions, and then generate and evaluate many pose hypotheses in parallel using modern graphics processing units (GPUs). We […]

CUDA

Feb, 5

QR decomposition on GPUs

QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive systems commonly employ QR decomposition to solve overdetermined least squares problems. Performance of QR decomposition is typically the crucial factor limiting problem sizes. Graphics Processing Units (GPUs) […]

CUDA

Feb, 5

Real-Time Prediction of Brain Shift Using Nonlinear Finite Element Algorithms

Patient-specific biomechanical models implemented using specialized nonlinear (i.e. taking into account material and geometric nonlinearities) finite element procedures were applied to predict the deformation field within the brain for five cases of craniotomy-induced brain shift. The procedures utilize the Total Lagrangian formulation with explicit time stepping. The loading was defined by prescribing deformations on the […]

CUDA

Feb, 5

Pixel-Exact Rendering of Spacetime Finite Element Solutions

Computational simulation of time-varying physical processes is of fundamental importance for many scientific and engineering applications. Most frequently, time-varying simulations are performed over multiple spatial grids at discrete points in time. We investigate a new approach to time-varying simulation: spacetime discontinuous Galerkin finite element methods. The result of this simulation method is a simplicial tessellation […]

OpenGL

Feb, 4

GPGPU-compatible archive based stochastic ranking evolutionary algorithm (G-ASREA) for multi-objective optimization

In this paper, a GPGPU (general purpose graphics processing unit) compatible Archived based Stochastic Ranking Evolutionary Algorithm (G-ASREA) is proposed, that ranks the population with respect to an archive of non-dominated solutions. It reduces the complexity of the deterministic ranking operator from O(mn^2) to O(man)* and further speeds up ranking on GPU. Experiments compare G-ASREA […]

high performance computing on graphics processing units: hgpu.org

Posts

Focused Volumetric Visual Hull with Color Extraction

Regular Expression Matching on Graphics Hardware for Intrusion Detection

Interactive water streams with sphere scan conversion

Software Pipelined Execution of Stream Programs on GPUs

Stochastic transparency

NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs

Model-driven autotuning of sparse matrix-vector multiply on GPUs

Real-Time Face Pose Estimation from Single Range Images

QR decomposition on GPUs

Real-Time Prediction of Brain Shift Using Nonlinear Finite Element Algorithms

Pixel-Exact Rendering of Spacetime Finite Element Solutions

GPGPU-compatible archive based stochastic ranking evolutionary algorithm (G-ASREA) for multi-objective optimization

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)