high performance computing on graphics processing units: hgpu.org

Posts

Dec, 9

Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

Open Computing Language (OpenCL) is an open standard for writing portable software for heterogeneous architectures such as Central Processing Units (CPUs) and Graphic Processing Units (GPUs). Programs written in OpenCL are functionally portable across architectures. However, due to the architectural differences, OpenCL does not warrant performance portability. As previous research shows, different architectures are sensitive […]

CUDA

•

OpenCL

Dec, 8

High Performance Multi-agent System based Simulations

Real-life city-traffic simulation presents a good example of multi-agent simulations involving a large number of agents (each human modelled as an individual agent). Analysis of emergent behaviors in social simulations largely depends on the number of agents involved (more than 100,000 agents at least). Due to large number of agents involved, it takes several seconds […]

CUDA

Dec, 8

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

PETSc is a scalable solver library developed at Argonne National Laboratory (ANL). It is widely used for solving system of equations arising from discretisation of partial differential equations (PDEs). GPU support has recently been added to PETSc to exploit the performance of GPUs. This support is quite new and currently only available in the PETSc […]

CUDA

•

OpenCL

Dec, 8

High performance dense linear system solver with soft error resilience

As the scale of modern high end computing systems continues to grow rapidly, system failure has become an issue that requires a better solution than the commonly used scheme of checkpoint and restart (C/R). While hard errors have been studied extensively over the years, soft errors are still under-studied especially for modern HPC systems, and […]

Dec, 8

Three-Dimensional Modeling of Long-Wave Runup: Simulation of Tsunami Inundation with GPU-SPHysics

Tsunamis need to be studied more carefully and quantitatively to fully understand their destructive impact on coastal areas. Numerical modeling provides an accurate and useful method to model tsunami inundations on a coastline. However, models must undergo a detailed verification and validation process to be used as an accurate hazard assessment tool. Using standards and […]

CUDA

Dec, 8

Graphics Processing Units for the Real-time Linear Elastostatic Simulation of Liver

Biomedical engineering solutions like surgical simulators need High Performance Computing (HPC) to achieve real-time performance. Graphics Processing Units (GPUs) offer HPC capabilities at low cost and low power consumption. In this work, it is demonstrated that a liver which is discretized by about 2500 finite element nodes, can be graphically simulated in realtime, by making […]

OpenGL

Dec, 8

Speeding up the evaluation phase of GP classification algorithms on GPUs

The efficiency of evolutionary algorithms has become a studied problem since it is one of the major weaknesses in these algorithms. Specifically, when these algorithms are employed for the classification task, the computational time required by them grows excessively as the problem complexity increases. This paper proposes an efficient scalable and massively parallel evaluation model […]

CUDA

Dec, 8

An Algorithm for Detecting Cycles in Undirected Graphs using CUDA Technology

Cycles count in a graph is an NP-complete problem. This work minimizes the execution time to solve the problem compared to the other traditional serial, CPU based one. It reduces the hardware resources needed to a single commodity GPU. We developed an algorithm to approximate counting the number of cycles in an undirected graph, by […]

CUDA

Dec, 8

Fast extraction of neuron morphologies from large-scale SBFSEM image stacks

Neuron morphology is frequently used to classify cell-types in the mammalian cortex. Apart from the shape of the soma and the axonal projections, morphological classification is largely defined by the dendrites of a neuron and their subcellular compartments, referred to as dendritic spines. The dimensions of a neuron’s dendritic compartment, including its spines, is also […]

CUDA

Dec, 8

Design and Optimization of Image Processing Algorithms on Mobile GPU

The advent of GPUs with programmable shaders on mobile phones has motivated developers to utilize GPU to offload computationally intensive tasks and relive the burden of embedded CPU. In this paper, we present a set of metrics to measure characteristics of a mobile phone GPU with the focus on image processing algorithms. These measures assist […]

Dec, 8

Research on CUDA-based Kriging Interpolation Algorithm

Three-dimensional geological model can describe the types of geological information efficiently, express a variety of topological relations among geological phenomena intuitively. Kriging interpolation algorithm is an important spatial interpolation method of three-dimensional geological modeling, but every grid point needs to compute augmented matrix and solve equations, so it costs too much time. With the modeling […]

CUDA

Dec, 7

Evolving Neural Networks on GPUs

Financial Time Series prediction attempts to model the behavior of financial markets using, among other things, tools like technical, intermarket, and fundamental indicators. Accurate prediction, however, is difficult for a number of reasons: financial markets are influenced, often in a non-linear, sometimes time-lagged fashion, by factors including interest and exchange rates, the rate of economic […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

High Performance Multi-agent System based Simulations

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

High performance dense linear system solver with soft error resilience

Three-Dimensional Modeling of Long-Wave Runup: Simulation of Tsunami Inundation with GPU-SPHysics

Graphics Processing Units for the Real-time Linear Elastostatic Simulation of Liver

Speeding up the evaluation phase of GP classification algorithms on GPUs

An Algorithm for Detecting Cycles in Undirected Graphs using CUDA Technology

Fast extraction of neuron morphologies from large-scale SBFSEM image stacks

Design and Optimization of Image Processing Algorithms on Mobile GPU

Research on CUDA-based Kriging Interpolation Algorithm

Evolving Neural Networks on GPUs

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)