high performance computing on graphics processing units: hgpu.org

Posts

Apr, 5

GPGPU supported cooperative acceleration in molecular dynamics

Molecular dynamics simulations have become a significant computational approach to study complicated physical phenomena at the atomic level. Nevertheless, accurate simulations are limited in size and timescale by the available computing resources, which make the simulations very time-consuming. This consequentially leads to tremendous computational requirements. Therefore, the need for speeding up this process is crucial. […]

Apr, 5

Parallelizing Simulated Annealing-Based Placement Using GPGPU

Simulated annealing has became the de facto standard for FPGA placement engines since it provides high quality solutions and is robust under a wide range of objective functions. However, this method will soon become prohibitive due to its sequential nature and since the performance of single-core processor has stagnated. General purpose computing on graphics processing […]

Apr, 5

GPGPU-FDTD method for 2-dimensional electromagnetic field simulation and its estimation

For signal/power integrity analysis of the high density packages and printed circuit boards, the FDTD (Finite-Difference Time-Domain) method has been widely used. In order to apply to large-scale problems, a variety of acceleration techniques are required. This paper describes a GPGPU-FDTD (General Purpose computing on GPU (Graphic Processing Unit)-Finite-Difference Time-Domain) method for massively parallel electromagnetic […]

Apr, 4

A Case Study of SWIM: Optimization of Memory Intensive Application on GPGPU

Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPU-GPU heterogeneous platform. In this paper, we choose SWIM, a typical memory intensive application from the SPEC OMP 2001 benchmark suite, for case […]

Apr, 4

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

We consider the problem of how to improve memory latency tolerance in massively multithreaded GPGPUs when the thread-level parallelism of an application is not sufficient to hide memory latency. One solution used in conventional CPU systems is prefetching, both in hardware and software. However, we show that straightforwardly applying such mechanisms to GPGPU systems does […]

CUDA

Apr, 4

GPGPU implementation of a synaptically optimized, anatomically accurate spiking network simulator

Simulation of biological spiking networks is becoming more relevant in understanding neuronal processes. An increasing proportion of these simulations focuses on large scale modeling efforts. Unfortunately the size of large networks is often limited by both computational power and memory. Computational power constrains both the maximum number of differential equations and the maximum number of […]

CUDA

Apr, 4

GPGPU-based Latency Insertion Method: Application to PDN simulations

With the progress of high-density integration technology of the circuits, a variety of signal and power integrity problems have become serious and important for the electronic design. This paper describes the fast circuit simulation by GPGPU-LIM (GPGPU-based Latency Insertion Method). First, LIM is reviewed, which is a fast algorithm. Next, implementation of LIM on the […]

Apr, 4

Migrating real-time depth image-based rendering from traditional to next-gen GPGPU

This paper focuses on the current revolution in using the GPU for general-purpose computations (GPGPU), and how to maximally exploit its powerful resources. Recently, the advent of next-generation GPGPU replaced the traditional way of exploiting the graphics hardware. We have migrated real-time depth image-based rendering – for use in contemporary 3DTV technology – and noticed […]

Apr, 4

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU

Memory optimization is an important strategy to gain high performance for sequence alignment implemented by CUDA on GPGPU. Smith-Waterman (SW) algorithm is the most sensitive algorithm widely used for local sequence alignment but very time consuming. Although several parallel methods have been used in some studies and shown good performances, advantages of GPGPU memory hierarchy […]

CUDA

Apr, 4

Parallel connected-component labeling algorithm for GPGPU applications

This paper proposes a new connected component labeling algorithm for GPGPU applications based on NVIDIA’s CUDA. Various approaches and algorithms for connected component labeling with minimal execution time were designed, but the most of them have been focused on optimizing CPU algorithm. Therefore it is hard to apply these approaches to GPGPU programming models such […]

CUDA

Apr, 4

Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications

The GPUs are emerging as a general-purpose high-performance computing device. Growing GPGPU research has made numerous GPGPU workloads available. However, a systematic approach to characterize these benchmarks and analyze their implication on GPU microarchitecture design evaluation is still lacking. In this research, we propose a set of microarchitecture agnostic GPGPU workload characteristics to represent them […]

CUDA

Apr, 4

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPGPU supported cooperative acceleration in molecular dynamics

Parallelizing Simulated Annealing-Based Placement Using GPGPU

GPGPU-FDTD method for 2-dimensional electromagnetic field simulation and its estimation

A Case Study of SWIM: Optimization of Memory Intensive Application on GPGPU

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

GPGPU implementation of a synaptically optimized, anatomically accurate spiking network simulator

GPGPU-based Latency Insertion Method: Application to PDN simulations

Migrating real-time depth image-based rendering from traditional to next-gen GPGPU

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU

Parallel connected-component labeling algorithm for GPGPU applications

Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)