high performance computing on graphics processing units: hgpu.org

Posts

Nov, 28

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N^2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) […]

CUDA

Nov, 28

Real-time restoration algorithm based on one-dimensional Wiener filters for different rates of image motion blur

To eliminate side-oblique image motion, a fast image algorithm is proposed for implementation on aerial camera systems. When an aerial camera works at a side-oblique angle, much parallel image motion with different rates will occur on the focal plane array simultaneously. Through analysis of how different rates of parallel image motion blur are generated and […]

Nov, 28

A shared-scene-graph image-warping architecture for VR: Low latency versus image quality

Designing low end-to-end latency system architectures for virtual reality is still an open and challenging problem. We describe the design, implementation and evaluation of a client-server depth-image warping architecture that updates and displays the scene graph at the refresh rate of the display. Our approach works for scenes consisting of dynamic and interactive objects. The […]

OpenGL

Nov, 28

On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs

Expectation Maximization (EM) and the Simultaneous Iterative Reconstruction Technique (SIRT) are two iterative computed tomography reconstruction algorithms often used when the data contain a high amount of statistical noise, have been acquired from a limited angular range, or have a limited number of views. A popular mechanism to increase the rate of convergence of these […]

CUDA

Nov, 28

Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach

Abstract Low-Density Parity-Check (LDPC) codes are powerful error correcting codes adopted by recent communication standards. LDPC decoders are based on belief propagation algorithms, which make use of a Tanner graph and very intensive message-passing computation, and usually require hardware-based dedicated solutions. With the exponential increase of the computational power of commodity graphics processing units (GPUs), […]

Nov, 28

Time-varying clustering for local lighting and material design

Abstract This paper presents an interactive graphics processing unit (GPU)-based relighting system in which local lighting condition, surface materials and viewing direction can all be changed on the fly. To support these changes, we simulate the lighting transportation process at run time, which is normally impractical for interactive use due to its huge computational burden. […]

Nov, 28

Shader-based tessellation to save memory bandwidth in a mobile multimedia processor

In this paper, we propose an architecture of tessellation hardware to save memory bandwidth in a mobile multimedia processor. To reduce the implementation overhead, floating-point computations of tessellation are accelerated by the conventional GPU pipeline, and only tessellation-specific control logic is handled by an additional hardware unit. Tightly coupled with a vertex shader, the additional […]

Nov, 28

Complexity effective memory access scheduling for many-core accelerator architectures

Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access locality and bank-level parallelism, which in turn maximizes DRAM bandwidth. This is especially important in graphics processing unit (GPU) architectures, where the large quantity of parallelism places a heavy demand on the memory system. The logic needed for out-of-order scheduling […]

CUDA

Nov, 27

2011 Symposium on Application Accelerators in High Performance Computing (SAAHPC’11)

What do GPUs, FPGAs, vector processors and other exotic special-purpose chips have in common? They are advanced processor architectures that the scientific community is using to accelerate computationally demanding applications. While high-performance computing systems that use application accelerators are still rare, they will be the norm rather than the exception in the near future. The […]

Nov, 27

Aurally and visually enhanced audio search with soundtorch

Finding a specific or an artistically appropriate sound in a vast collection comprising thousands of audio files containing recordings of, say, footsteps, gunshots, and thunderclaps easily becomes a chore. To improve on this, we have developed an enhanced auditory and graphical zoomable user interface that leverages the human brain’s capability to single out sounds from […]

Nov, 27

Interactive Pixel-Accurate Free Viewpoint Rendering from Images with Silhouette Aware Sampling

We present an integrated, fully GPU-based processing pipeline to interactively render new views of arbitrary scenes from calibrated but otherwise unstructured input views. In a two-step procedure, our method first generates for each input view a dense proxy of the scene using a new multi-view stereo formulation. Each scene proxy consists of a structured cloud […]

Nov, 27

A Massively Parallel Architecture for Bioinformatics

Today’s general purpose computers lack in meeting the requirements on computing performance for standard applications in bioinformatics like DNA sequence alignment, error correction for assembly, or TFBS finding. The size of DNA sequence databases doubles twice a year. On the other hand the advance in computing performance per unit cost only doubles every 2 years. […]

high performance computing on graphics processing units: hgpu.org

Posts

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

Real-time restoration algorithm based on one-dimensional Wiener filters for different rates of image motion blur

A shared-scene-graph image-warping architecture for VR: Low latency versus image quality

On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs

Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach

Time-varying clustering for local lighting and material design

Shader-based tessellation to save memory bandwidth in a mobile multimedia processor

Complexity effective memory access scheduling for many-core accelerator architectures

2011 Symposium on Application Accelerators in High Performance Computing (SAAHPC’11)

Aurally and visually enhanced audio search with soundtorch

Interactive Pixel-Accurate Free Viewpoint Rendering from Images with Silhouette Aware Sampling

A Massively Parallel Architecture for Bioinformatics

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)