high performance computing on graphics processing units: hgpu.org

Posts

Dec, 7

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) […]

Dec, 7

A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets

We present an adaptive out-of-core technique for rendering massive scalar volumes employing single-pass GPU ray casting. The method is based on the decomposition of a volumetric dataset into small cubical bricks, which are then organized into an octree structure maintained out-of-core. The octree contains the original data at the leaves, and a filtered representation of […]

Dec, 7

Vector graphics depicting marbling flow

We present an efficient framework for generating marbled textures that can be exported into a vector graphics format based on an explicit surface tracking method (see Figure 1). The proposed method enables artists to create complex and realistic marbling textures that can be used for design purposes. Our algorithm is unique in that the marbling […]

CUDA

Dec, 7

A Real-Time Multigrid Finite Hexahedra Method for Elasticity Simulation using CUDA

We present a multigrid approach for simulating elastic deformable objects in real time on recent NVIDIA GPU architectures. To accurately simulate large deformations we consider the co-rotated strain formulation. Our method is based on a finite element discretization of the deformable object using hexahedra. It draws upon recent work on multigrid schemes for the efficient […]

CUDA

Dec, 7

GPU-based Monte Carlo simulation in neutron transport and finite differences heat equation evaluation

Graphics Processing Units (GPU) are high performance co-processors originally intended to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate […]

Dec, 7

Simulation of Coarse-Grained Protein-Protein Interactions with Graphics Processing Units

We report a hybrid parallel central and graphics processing units (CPU-GPU) implementation of a coarse-grained model for replica exchange Monte Carlo (REMC) simulations of protein assemblies. We describe the design, optimization, validation, and benchmarking of our algorithms, particularly the parallelization strategy, which is specific to the requirements of GPU hardware. Performance evaluation of our hybrid […]

Dec, 6

Massive parallel LDPC decoding on GPU

Low-Density Parity-Check (LDPC) codes are powerful error correcting codes (ECC). They have recently been adopted by several data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite graphs, also called Tanner graphs, and their decoding demands very intensive computation. For that reason, VLSI dedicated architectures have been investigated and developed over the […]

CUDA

Dec, 6

GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences

Discovery of motifs that are repeated in groups of biological sequences is a major task in bioinformatics. Iterative methods such as expectation maximization (EM) are used as a common approach to find such patterns. However, corresponding algorithms are highly compute-intensive due to the small size and degenerate nature of biological motifs. Runtime requirements are likely […]

OpenGL

Dec, 6

Parallel SimRank computation on large graphs with iterative aggregation

Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two […]

CUDA

Dec, 6

BLAS Comparison on FPGA, CPU and GPU

High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on […]

CUDA

Dec, 6

Skinning with dual quaternions

Skinning of skeletally deformable models is extensively used for real-time animation of characters, creatures and similar objects. The standard solution, linear blend skinning, has some serious drawbacks that require artist intervention. Therefore, a number of alternatives have been proposed in recent years. All of them successfully combat some of the artifacts, but none challenge the […]

Dec, 6

Real-Time Visibility-Based Fusion of Depth Maps

We present a viewpoint-based approach for the quick fusion of multiple stereo depth maps. Our method selects depth estimates for each pixel that minimize violations of visibility constraints and thus remove errors and inconsistencies from the depth maps to produce a consistent surface. We advocate a two-stage process in which the first stage generates potentially […]

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets

Vector graphics depicting marbling flow

A Real-Time Multigrid Finite Hexahedra Method for Elasticity Simulation using CUDA

GPU-based Monte Carlo simulation in neutron transport and finite differences heat equation evaluation

Simulation of Coarse-Grained Protein-Protein Interactions with Graphics Processing Units

Massive parallel LDPC decoding on GPU

GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences

Parallel SimRank computation on large graphs with iterative aggregation

BLAS Comparison on FPGA, CPU and GPU

Skinning with dual quaternions

Real-Time Visibility-Based Fusion of Depth Maps

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)