high performance computing on graphics processing units: hgpu.org

Posts

Aug, 31

A framework for volume segmentation and visualization using Augmented Reality

We propose a two-handed direct manipulation system to achieve complex volume segmentation of CT/MRI data in Augmented Reality with a remote controller attached to a motion tracking cube. At the same time segmented data is displayed by direct volume rendering using a programmable GPU. Our system achieves visualization of real time modification of volume data […]

Aug, 31

Reliability modeling of MEMS devices on CUDA based HPC setup

In this paper, we have reviewed the development in CUDA and the implementation of various distribution that exists in the reliability for MEMS based devices on a CUDA setup. The various distributions can be highly optimized so that the system can be simulated highly on CUDA. We have shown the type of distribution may vary […]

CUDA

Aug, 31

Multicore bundle adjustment

We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations of current generation […]

CUDA

Aug, 31

FastMag: Fast micromagnetic simulator for complex magnetic structures

A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle multiscale problems with a high computational efficiency. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel graphics processing unit (GPU) architectures. FastMag discretizes the computational domain […]

CUDA

Aug, 31

Hyperfast Perspective Cone–Beam Backprojection

Tomographic image reconstruction, such as the reconstruction of CT projection values, of tomosynthesis data, PET or SPECT events, is computational very demanding. The most time-consuming step is the backprojection which is often limited by the memory bandwidth. Recently, a novel general purpose architecture optimized for distributed computing became available: the Cell Broadband Engine (CBE). Its […]

Aug, 31

Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware

The task of reconstructing an object from its projections via tomographic methods is a time-consuming process due to the vast complexity of the data. For this reason, manufacturers of equipment for medical computed tomography (CT) rely mostly on special application specified integrated circuits (ASICs) to obtain the fast reconstruction times required in clinical settings. Although […]

Aug, 31

Real-Time Adaptive Radiometric Compensation

Recent radiometric compensation techniques make it possible to project images onto colored and textured surfaces. This is realized with projector-camera systems by scanning the projection surface on a per-pixel basis. Using the captured information, a compensation image is calculated that neutralizes geometric distortions and color blending caused by the underlying surface. As a result, the […]

Aug, 31

Bundled depth-map merging for multi-view stereo

Depth-map merging is one typical technique category for multi-view stereo (MVS) reconstruction. To guarantee accuracy, existing algorithms usually require either sub-pixel level stereo matching precision or continuous depth-map estimation. The merging of inaccurate depth-maps remains a challenging problem. This paper introduces a bundle optimization method for robust and accurate depth-map merging. In the method, depth-maps […]

Aug, 31

Partial wave analysis at BES III harnessing the power of GPUs

Partial wave analysis is a core tool in hadron spectroscopy. With the high statistics data available at facilities such as the Beijing Spectrometer III, this procedure becomes computationally very expensive. We have successfully implemented a framework for performing partial wave analysis on graphics processors. We discuss the implementation, the parallel computing frameworks employed and the […]

OpenCL

Aug, 31

Partial Wave Analysis using Graphics Cards

Partial wave analysis is a key technique in hadron spectroscopy. The use of unbinned likelihood fits on large statistics data samples and ever more complex physics models makes this analysis technique computationally very expensive. Parallel computing techniques, in particular the use of graphics processing units, are a powerful means to speed up analyses; in the […]

OpenCL

Aug, 31

Volume exploration using ellipsoidal Gaussian transfer functions

This paper presents an interactive transfer function design tool based on ellipsoidal Gaussian transfer functions (ETFs). Our approach explores volumetric features in the statistical space by modeling the space using the Gaussian mixture model (GMM) with a small number of Gaussians to maximize the likelihood of feature separation. Instant visual feedback is possible by mapping […]

Aug, 31

FPGA based Speeded Up Robust Features

We present an implementation of the Speeded Up Robust Features (SURF) on a Field Programmable Gate Array (FPGA). The SURF algorithm extracts salient points from image and computes descriptors of their surroundings that are invariant to scale, rotation and illumination changes. The interest point detection and feature descriptor extraction algorithm is often used as the […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A framework for volume segmentation and visualization using Augmented Reality

Reliability modeling of MEMS devices on CUDA based HPC setup

Multicore bundle adjustment

FastMag: Fast micromagnetic simulator for complex magnetic structures

Hyperfast Perspective Cone–Beam Backprojection

Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware

Real-Time Adaptive Radiometric Compensation

Bundled depth-map merging for multi-view stereo

Partial wave analysis at BES III harnessing the power of GPUs

Partial Wave Analysis using Graphics Cards

Volume exploration using ellipsoidal Gaussian transfer functions

FPGA based Speeded Up Robust Features

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)