high performance computing on graphics processing units: hgpu.org

Posts

Jul, 27

Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism

CUDA is an advanced massively parallel computing platform that can provide high performance computing power at much more affordable cost. In this paper, we present a parallel graph-based substructure pattern mining algorithm using CUDA Dynamic Parallelism. The key contribution is a parallel solution to traversing the DFS (Depth First Search) code tree. Furthermore, we implement […]

CUDA

Jul, 27

Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture

Filtered back-projection algorithms are widely used for the reconstruction of volumetric data from cone-beam projections in interventional C-arm computed tomography. Furthermore, general-purpose GPUs have become a popular tool for accelerating the reconstruction during time-critical clinical procedures. In this work, we focus on the systematic performance optimization of cone-beam back-projection on the latest architecture of CUDA-enabled […]

CUDA

Jul, 27

Implementation of 2-D Discrete Cosine Transform Algorithm on GPU

Discrete Cosine Transform (DCT) is a technique to get frequency separation. When DCT is applied on an image, it will give frequency segregation of an image since it is composed of DC value and range of low frequency values to high frequency values. DCT is very useful in image compression. When high frequency values are […]

CUDA

Jul, 27

Fast Image Processing with Embedded Microprocessors

This Thesis intends to be a startup guide in understanding the basics of Image Processing techniques and common use cases, but at the same time take advantage of the Graphics Processing Unit available in today’s embedded multimedia microprocessors present in netbooks, smartphones and tablets.

OpenGL

Jul, 27

GPU Parallel Algorithms for Reporting Movement Behaviour Patterns in Spatiotemporal Databases

Mobility is a key element of many processes and activities, and the understanding of movement is important in many areas of science and technology. With the recent advances in technologies for mobile devices, like GPS and mobile phones, we are able to generate data sets of people, animals, vehicles and other moving objects, normally available […]

CUDA

Jul, 25

A unified sparse matrix data format for modern processors with wide SIMD units

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single […]

CUDA

Jul, 25

Parallel birth and death process for cell nuclei extraction in histopathology images

Cell nuclei extraction from histopathology images is necessary for breast cancer grading, and has become one of the major problem in the domain of automatic image analysis. Stochastic marked point processes combined with birth and death processes are promising tools for such extraction, but they are extremely compute intensive, especially on large images such as […]

CUDA

Jul, 25

Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit

In recent years, Artificial Neural networks (ANNs) have been intensively employed to build smart model of microwave devices. In this paper a characterization of lossy SIW resonators by means of Multilayer Perceptron Neural Networks (MLPNNs) on Graphics Processing Unit (GPU), is presented. Once properly selected and trained, a MLPNN can evaluate the lossy SIW resonator’s […]

CUDA

Jul, 25

Octree Light Propagation Volumes

This paper presents a new method for representing Light Propagation Volumes using an octree data structure, and for allowing light from regular point light sources to be injected into them. The resulting technique uses full octrees with the help of a separate data structure for representing the octree structure. The octree structure enables light propagation […]

CUDA

•

OpenGL

Jul, 25

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without […]

CUDA

Jul, 25

Efficient Rendering of Scenes with Dynamic Lighting Using a Photons Queue and Incremental Update Algorithm

Photon mapping is a popular extension to the classic ray tracing algorithm in the field of realistic image synthesis. Moreover, it benefits from the massive parallelism computational power brought by recent developments in graphics processor hardware and programming models. However rendering the scenes with dynamic lights still greatly limits the performance due to the re-construction […]

CUDA

Jul, 25

Modeling of Heterogeneous Architecture with GPU to Exascale System

The High-Performance Computing (HPC) community aimed for many years at increasing performance regardless of energy consumption. However, energy is limiting the scalability of the next generation of supercomputers. Current HPC systems already consume huge amounts of power, in the order of a few MegaWatts (MW). The future HPC systems intend to achieve 10 to 100 […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism

Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture

Implementation of 2-D Discrete Cosine Transform Algorithm on GPU

Fast Image Processing with Embedded Microprocessors

GPU Parallel Algorithms for Reporting Movement Behaviour Patterns in Spatiotemporal Databases

A unified sparse matrix data format for modern processors with wide SIMD units

Parallel birth and death process for cell nuclei extraction in histopathology images

Characterization of Lossy SIW Resonators Based on Multilayer Perceptron Neural Networks on Graphics Processing Unit

Octree Light Propagation Volumes

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Efficient Rendering of Scenes with Dynamic Lighting Using a Photons Queue and Incremental Update Algorithm

Modeling of Heterogeneous Architecture with GPU to Exascale System

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)