high performance computing on graphics processing units: hgpu.org

Posts

Feb, 8

VolQD: Direct Volume Rendering of Multi-million Atom Quantum Dot Simulations

In this work we present a hardware-accelerated direct volume rendering system for visualizing multivariate wave functions in semiconducting quantum dot (QD) simulations. The simulation data contains the probability density values of multiple electron orbitals for up to tens of millions of atoms, computed by the NEMO3-D quantum device simulator software run on large-scale cluster architectures. […]

OpenGL

Feb, 8

Automatic C-to-CUDA Code Generation for Affine Programs

Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded parallel programming model, facilitating high performance implementations of general-purpose computations. However, the explicitly managed memory hierarchy and multi-level parallel view make manual development of high-performance CUDA code rather complicated. Hence the automatic transformation of sequential input programs into efficient […]

CUDA

Feb, 8

Automatic program parallelization for multicore processors

With the advent of multi-core processors the problem of designing application that efficiently can utilize it performance become more and more important. Moreover developing programs for these processors requires from the programmers some additional, specific knowledge about the processor architecture. In multi-core systems efficient program execution is the main issue. It can even happen that […]

Feb, 8

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal […]

Feb, 8

Real-Time Simulation and Visualization of Subject-Specific 3D Lung Dynamics

In this paper we discuss a framework for modeling the 3D lung dynamics of normal and diseased human subjects and visualizing them using an Augmented Reality (AR) based environment. The framework is based on the results obtained from pulmonary function tests and lung image-data of human subjects obtained from 4D High-Resolution Computed Tomography (HRCT). The […]

Feb, 8

Towards On-Line Digital Doubles

We present a modular system for real-time 3D-scanning of human bodies under motion. The high-resolution shape and colour appearance is captured by several scanning units positioned around the object of interest. Each of these units performs a foreground-background segmentation and computes a valid depth-range for the spatially neighbouring units. Multiple depth-ranges are combined in a […]

Feb, 8

Programmable shaders for deformation rendering

In this paper, we present a method for rendering deformations as part of the programmable shader pipeline of contemporary Graphical Processing Units. In our method, we allow general deformations including cuts. Previous approaches to deformation place the role of the GPU as a general purpose processor for computing vertex displacement. With the advent of vertex […]

Feb, 7

Data parallel acceleration of decision support queries using Cell/BE and GPUs

Decision Support System (DSS) workloads are known to be one of the most time-consuming database workloads that processes large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessor. The topic addressed in this work is to analyze the benefits of using high-performance/low-cost processors such as the GPUs and the Cell/BE to accelerate DSS […]

Feb, 7

Fitting multi-planet transit models to photometric time-data series by evolution strategies

In this paper we present the application of an evolution strategy to the problem of detecting multi-planet transit events in photometric time-data series. Planetary transits occur when a planet regularly eclipses its host star, reducing stellar luminosity. The transit method is amongst the most successful detection methods for exoplanet and is presently performed by space […]

CUDA

Feb, 7

Learning to Detect Roads in High-Resolution Aerial Images

Reliably extracting information from aerial imagery is a difficult problem with many practical applications. One specific case of this problem is the task of automatically detecting roads. This task is a difficult vision problem because of occlusions, shadows, and a wide variety of non-road objects. Despite 30 years of work on automatic road detection, no […]

CUDA

Feb, 7

Axel: a heterogeneous cluster with FPGAs and GPUs

This paper describes a heterogeneous computer cluster called Axel. Axel contains a collection of nodes; each node can include multiple types of accelerators such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units). A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing […]

CUDA

Feb, 7

High-Quality Point-Based Rendering on Modern GPUs

In the last years, point-based rendering has been shown to offer the potential to outperform traditional triangle based rendering both in speed and visual quality when it comes to processing highly complex models. Existing surface splatting techniques achieve superior visual quality by proper filtering but they are still limited in rendering speed. On the other […]

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

VolQD: Direct Volume Rendering of Multi-million Atom Quantum Dot Simulations

Automatic C-to-CUDA Code Generation for Affine Programs

Automatic program parallelization for multicore processors

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Real-Time Simulation and Visualization of Subject-Specific 3D Lung Dynamics

Towards On-Line Digital Doubles

Programmable shaders for deformation rendering

Data parallel acceleration of decision support queries using Cell/BE and GPUs

Fitting multi-planet transit models to photometric time-data series by evolution strategies

Learning to Detect Roads in High-Resolution Aerial Images

Axel: a heterogeneous cluster with FPGAs and GPUs

High-Quality Point-Based Rendering on Modern GPUs

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)