high performance computing on graphics processing units: hgpu.org

Posts

Dec, 11

Volume and Isosurface Rendering with GPU-Accelerated Cell Projection

We present an efficient GPU-based implementation of the Projected Tetrahedra (PT) algorithm. By reducing most of the CPU-GPU data transfer, the algorithm achieves interactive frame rates (up to 2.0 M Tets/s) on current graphics hardware. Since no topology information is stored, it requires substantially less memory than recent interactive ray casting approaches. The method uses […]

OpenGL

Dec, 11

Distributed Texture Memory in a Multi-GPU Environment

In this paper we present a consistent, distributed, shared memory system for GPU texture memory. This model enables the virtualization of texture memory and the transparent, scalable sharing of texture data across multiple GPUs. Textures are stored as pages, and as textures are read or written, our system satisfies requests for pages on demand while […]

Dec, 11

Zippy: A Framework for Computation and Visualization on a GPU Cluster

Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general-purpose computation and visualization applications. However, the programming model for high performance general-purpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy frame-work, a general and scalable solution to this problem. It abstracts […]

Dec, 11

GPU accelerated radio astronomy signal convolution

The increasing array size of radio astronomy interferometers is causing the associated computation to scale quadratically with the number of array signals. Consequently, efficient usage of alternate processing architectures should be explored in order to meet this computational challenge. Affordable parallel processors have been made available to the general scientific community in the form of […]

CUDA

Dec, 11

Program optimization carving for GPU computing?

Contemporary many-core processors such as the GeForce 8800 GTX enable application developers to utilize various levels of parallelism to enhance the performance of their applications. However, iterative optimization for such a system may lead to a local performance maximum, due to the complexity of the system. We propose program optimization carving, a technique that begins […]

Dec, 11

Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA

Molecular dynamics is an important computational tool to simulate and understand biochemical processes at the atomic level. However, accurate simulation of processes such as protein folding requires a large number of both atoms and time steps. This in turn leads to huge runtime requirements. Hence, finding fast solutions is of highest importance to research. In […]

CUDA

Dec, 11

Displacement Mapping on the GPU – State of the Art

This paper reviews the latest developments of displacement mapping algorithms implemented on the vertex, geometry, and fragment shaders of graphics cards. Displacement mapping algorithms are classified as per-vertex and per-pixel methods. Per-pixel approaches are further categorized as safe algorithms that aim at correct solutions in all cases, to unsafe techniques that may fail in extreme […]

Dec, 11

GPU-boosted online image matching

Matching feature points between images is a key point in many computer vision tasks. As the number of images increases, this rapidly becomes a bottleneck. We here present how to use the power of GPUs to obtain image matching in typically 20 ms for one thousand points. This speedup makes applications like interactive image matching […]

OpenGL

Dec, 11

Future graphics architectures

Graphics architectures are in the midst of a major transition. In the past, these were specialized architectures designed to support a single rendering algorithm: the standard Z buffer. Realtime 3D graphics has now advanced to the point where the Z-buffer algorithm has serious shortcomings for generating the next generation of higher-quality visual effects demanded by […]

Dec, 11

The impact of accelerator processors for high-throughput molecular modeling and simulation

The recent introduction of cost-effective accelerator processors (APs), such as the IBM Cell processor and Nvidia’s graphics processing units (GPUs), represents an important technological innovation which promises to unleash the full potential of atomistic molecular modeling and simulation for the biotechnology industry. Present APs can deliver over an order of magnitude more floating-point operations per […]

Dec, 10

The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2011

HPDC is the premier computer science conference for presenting new results relating to large scale high performance and distributed systems used in science and industry. For twenty years, HPDC has been at the center of new discoveries in clusters, grids, clouds, and parallel and multicore computers.

Dec, 10

3rd Workshop on using Emerging Parallel Architectures (WEPA) in conjunction with International Conference on Computational Science, ICCS 2011

The computing landscape has undergone significant transformation with the emergence of more powerful processing elements such as GPUs, FPGAs, Cell B.E., multi-cores, etc. On the multi-core front, Moore’s Law has transcended beyond the single processor boundary with the prediction that the number of cores will double every 18 months. Going forward, the primary method of […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Volume and Isosurface Rendering with GPU-Accelerated Cell Projection

Distributed Texture Memory in a Multi-GPU Environment

Zippy: A Framework for Computation and Visualization on a GPU Cluster

GPU accelerated radio astronomy signal convolution

Program optimization carving for GPU computing?

Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA

Displacement Mapping on the GPU – State of the Art

GPU-boosted online image matching

Future graphics architectures

The impact of accelerator processors for high-throughput molecular modeling and simulation

The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2011

3rd Workshop on using Emerging Parallel Architectures (WEPA) in conjunction with International Conference on Computational Science, ICCS 2011

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)