high performance computing on graphics processing units: hgpu.org

Posts

Sep, 3

View-Dependent Streamlines for 3D Vector Fields

This paper introduces a new streamline placement and selection algorithm for 3D vector fields. Instead of considering the problem as a simple feature search in data space, we base our work on the observation that most streamline fields generate a lot of self-occlusion which prevents proper visualization. In order to avoid this issue, we approach […]

CUDA

•

OpenGL

Sep, 3

Utilizing Hierarchical Multiprocessing for Medical Image Registration

This work discusses an approach to utilize hierarchical multiprocessing in the context of medical image registration. By first organizing application parallelism into a domain-specific taxonomy, an algorithm is structured to target a set of multicore platforms.The approach on a cluster of graphics processing units (GPUs) requiring the use of two parallel programming environments to achieve […]

CUDA

Sep, 3

Interactive Reaction-Diffusion on Surface Tiles

This paper proposes to perform reaction-diffusion on surface tiles. The square tiles fit nicely and cost-effectively in GPU memory, whereas we also apply distortion minimization on tiles so as to precisely reduce the unbalanced scale and resolution problem of chemicals in the reaction- diffusion. The interconnection nature of tiles accounts for the surface topology, and […]

OpenGL

Sep, 3

High-Performance Computing with Accelerators

This issue of CiSE is based on work presented at the US National Science Foundation workshop, Path to Petascale: Adapting Geo/Chem/Astro Applications for Accelerators and Accelerator Clusters, held at the US National Center for Supercomputing Applications (NCSA) in early 2009. The workshop was designed to raise awareness about the emergence of accelerator-based high-performance computing (HPC) […]

Sep, 3

An 8.6 mW 25 Mvertices/s 400-MFLOPS 800-MOPS 8.91 mm Multimedia Stream Processor Core for Mobile Applications

For the demands of mobile multimedia applications, a stream processor core is designed with 8.91 mm2 area in 0.18 mum CMOS technology at 50 MHz. Several techniques and architectures are proposed to achieve high performance with low power consumption. First of all, an optimized core pipeline is designed with 2-issue VLIW architecture to achieve the […]

Sep, 3

Improving Scheduling Techniques in Heterogeneous Systems with Dynamic, On-Line Optimisations

Computational performance increasingly depends on parallelism, and many systems rely on heterogeneous resources such as GPUs and FPGAs to accelerate computationally intensive applications. However, implementations for such heterogeneous systems are often hand-crafted and optimised to one computation scenario, and it can be challenging to maintain high performance when application parameters change. In this paper, we […]

Sep, 3

Fused DTI/HARDI Visualization

High-angular resolution diffusion imaging (HARDI) is a diffusion weighted MRI technique that overcomes some of the decisive limitations of its predecessor, diffusion tensor imaging (DTI), in the areas of composite nerve fiber structure. Despite its advantages, HARDI raises several issues: complex modeling of the data, nonintuitive and computationally demanding visualization, inability to interactively explore and […]

Sep, 2

Using Graphics Processing Units for Logic Simulation of Electronic Designs

Logic simulation is the major verification technique used for electronic system designs. Speeding up logic simulation results in great savings and shorter time-to-market. We parallelize logic simulation using Graphics Processing Units (GPUs). We present a parallel cycle-based logic simulation algorithm that uses And Inverter Graphs (AIGs) as design representations. We partition the gates in the […]

CUDA

Sep, 2

Using GPU to exploit parallelism on cryptography

In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison […]

CUDA

Sep, 2

Generalized Voronoi Diagram Computation on GPU

We study the problem of using the GPU to compute the generalized Voronoi diagram (GVD) for higher-order sites, such as line segments and curves. This problem has applications in many fields, including computer animation, pattern recognition and so on. A number of methods have been proposed that use the GPU to speed up the computation […]

OpenGL

Sep, 2

A GPU Accelerated Algorithm for Compressive Sensing Based Image Super-Resolution

This paper presents a parallel algorithm designed for Super-resolution Image Reconstruction based on Compressive sensing in the ATI Stream platform. In the accelerating process, we select part of the serial program as the objects to be sped up according to the execution time of each stage, set appropriate parallel granularity to make full use of […]

Sep, 2

GPU-accelerated time-domain circuit simulation

Time-domain circuit simulation is often dominated by the transistor model evaluation time. An analysis of a test suite of 27 circuits shows 66% of the transient runtime is spent evaluating the core BSIM4 transistor model code. A modern graphics processing unit (GPU) is a highly paralled, high performance computer suitable for non-graphics tasks. Circuit simulation […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

View-Dependent Streamlines for 3D Vector Fields

Utilizing Hierarchical Multiprocessing for Medical Image Registration

Interactive Reaction-Diffusion on Surface Tiles

High-Performance Computing with Accelerators

An 8.6 mW 25 Mvertices/s 400-MFLOPS 800-MOPS 8.91 mm Multimedia Stream Processor Core for Mobile Applications

Improving Scheduling Techniques in Heterogeneous Systems with Dynamic, On-Line Optimisations

Fused DTI/HARDI Visualization

Using Graphics Processing Units for Logic Simulation of Electronic Designs

Using GPU to exploit parallelism on cryptography

Generalized Voronoi Diagram Computation on GPU

A GPU Accelerated Algorithm for Compressive Sensing Based Image Super-Resolution

GPU-accelerated time-domain circuit simulation

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)