high performance computing on graphics processing units: hgpu.org

Posts

Jan, 12

Compensated Visual Hull for Defective Segmentation and Occlusion

We propose an advanced visual hull technique to compensate for outliers using the reliabilities of the silhouettes. The proposed method consists of a foreground extraction technique based on the Generalized Gaussian Family model and a compensated shape-from-silhouette algorithm. They are connected by the intra-/inter-silhouette reliabilities to compensate for carving errors from defective segmentation or partial […]

Jan, 12

Automatic Hepatic Vessel Segmentation Using Graphics Hardware

The accurate segmentation of liver vessels is an important prerequisite for creating oncologic surgery planning tools as well as medical visualization applications. In this paper, a fully automatic approach is presented to quickly enhance and extract the vascular system of the liver from CT datasets. Our framework consists of three basic modules: vessel enhancement on […]

Jan, 12

Robust mesh reconstruction from unoriented noisy points

We present a robust method to generate mesh surfaces from unoriented noisy points in this paper. The whole procedure consists of three steps. Firstly, the normal vectors at points are evaluated by a highly robust estimator which can fit surface corresponding to less than half of the data points and fit data with multi-structures. This […]

CUDA

Jan, 12

A Variational Model for Interactive Shape Prior Segmentation and Real-Time Tracking

In this paper, we introduce a semi-automated segmentation method based on minimizing the Geodesic Active Contour energy incorporating a shape prior. We increase the robustness of the segmentation result using the additional shape information that represents the desired structure. Furthermore the user has the possibility to take corrective actions during the segmentation and adapt the […]

CUDA

Jan, 12

High-Quality Rendering of Varying Isosurfaces with Cubic Trivariate C1-Continuous Splines

Smooth trivariate splines on uniform tetrahedral partitions are well suited for high-quality visualization of isosurfaces from scalar volumetric data. We propose a novel rendering approach based on spline patches with low total degree, for which ray-isosurface intersections are computed using efficient root finding algorithms. Smoothly varying surface normals are directly extracted from the underlying spline […]

CUDA

Jan, 12

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Heterogeneous multiprocessors are increasingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However, the state-of-the-art approach is to rely on the programmer to specify this […]

CUDA

Jan, 12

Inertial-aided KLT feature tracking for a moving camera

We propose a novel inertial-aided KLT feature tracking method robust to camera ego-motions. The conventional KLT uses images only and its working condition is inherently limited to small appearance change between images. When big optical flows are induced by a camera-ego motion, an inertial sensor attached to the camera can provide a good prediction to […]

CUDA

Jan, 12

Using graphics processing units to generate random numbers

The future of high-performance computing is aligning itself towards the efficient use of highly parallel computing environments. One application where the use of massive parallelism comes instinctively is Monte Carlo simulations, where a large number of independent events have to be simulated. At the core of the Monte Carlo simulation lies the Random Number Generator […]

CUDA

Jan, 12

Performance potential for simulating spin models on GPU

Graphics processing units (GPUs) are recently being used to an increasing degree for general computational purposes. This development is motivated by their theoretical peak performance, which significantly exceeds that of broadly available CPUs. For practical purposes, however, it is far from clear how much of this theoretical performance can be realized in actual scientific applications. […]

CUDA

Jan, 11

Understanding the design trade-offs among current multicore systems for numerical computations

In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application designers in better mapping their software to the most suitable architecture, with an additional goal of influencing future computing system design. We specifically examine five architectures, based on: the Intel […]

CUDA

Jan, 11

A memory optimization technique for software-managed scratchpad memory in GPUs

With the appearance of massively parallel and inexpensive platforms such as the G80 generation of NVIDIA GPUs, more real-life applications will be designed or ported to these platforms. This requires structured transformation methods that remove existing application bottlenecks in these platforms. Balancing the usage of on-chip resources, used for improving the application performance, in these […]

CUDA

Jan, 11

Interactive Volume Rendering of Functional Representations in Quantum Chemistry

Simulation and computation in chemistry studies have been improved as computational power has increased over decades. Many types of chemistry simulation results are available, from atomic level bonding to volumetric representations of electron density. However, tools for the visualization of the results from quantum chemistry computations are still limited to showing atomic bonds and isosurfaces […]

high performance computing on graphics processing units: hgpu.org

Posts

Compensated Visual Hull for Defective Segmentation and Occlusion

Automatic Hepatic Vessel Segmentation Using Graphics Hardware

Robust mesh reconstruction from unoriented noisy points

A Variational Model for Interactive Shape Prior Segmentation and Real-Time Tracking

High-Quality Rendering of Varying Isosurfaces with Cubic Trivariate C1-Continuous Splines

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Inertial-aided KLT feature tracking for a moving camera

Using graphics processing units to generate random numbers

Performance potential for simulating spin models on GPU

Understanding the design trade-offs among current multicore systems for numerical computations

A memory optimization technique for software-managed scratchpad memory in GPUs

Interactive Volume Rendering of Functional Representations in Quantum Chemistry

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)