high performance computing on graphics processing units: hgpu.org

Posts

May, 3

Accelerating BP Neural Network-Based Image Compression by CPU and GPU Cooperation

Recently, GPU has evolved into a highly parallel, multithreading, many core processor with tremendous computational capability and very high memory bandwidth. At the same time, multi-core CPU evolution continued and today’s CPUs have 4-8 cores which offer dramatically increased performance and power savings characteristics. We are aware of very few works that consider both devices […]

May, 3

Statistical power modeling of GPU kernels using performance counters

We present a statistical approach for estimating power consumption of GPU kernels. We use the GPU performance counters that are exposed for CUDA applications, and train a linear regression model where performance counters are used as independent variables and power consumption is the dependent variable. For model training and evaluation, we use publicly available CUDA […]

CUDA

May, 3

GPU-based elastic-object deformation for enhancement of existing haptic applications

Most haptic libraries allow user to feel the resistance of a flexible virtual object by the implementation of a point-based collision detection algorithm and a spring-damper model. Even though the user can feel the deformation at the contact point, the graphics library renders a rigid geometry, causing a conflict of senses in the user’s mind. […]

OpenGL

May, 3

A Framework of Large-Scale Terrain Visualization Based on GPU

In this paper an efficient solution of large scale, interactive terrain visualization is proposed based on graphic hardware. The method of multi-thread is adopted to resolve the bottle-neck problem of data fetching from the external memory, and the methods of level of details (LODs) based on view-dependent and mesh rendering in graphics programmable units (GPU) […]

May, 3

Realtime Simulation of Burning Solids on GPU with CUDA

In this paper, we implement a hybrid combustion model which incorporates the fire simulation and solid burning together. To achieve real-time performance, GPU is used to solve the Navier-Stokes equations with CUDA programming, also used to visualize the turbulent fire. Experiment results prove our method’s efficiency.

CUDA

May, 3

GPU-CPU multi-core for real-time signal processing

Modern graphics cards are supported with powerful computational facilities for fast computation of vertex geometry and realistic rendering of 3D graphics. The introduction of programmable pipeline in the graphics processing units (GPU) has enabled configurability. GPU which is available in every computer has a tremendous feat of highly parallel SIMD processing, but its capability is […]

OpenGL

May, 3

Real-time Medical Image Volume Rendering Based on GPU Accelerated Method

By growing power and flexibility of modern GPU, hardware based volume rendering techniques show their great powers in accelerating rendering. In this paper, we propose a 3D texture base algorithm with local phong lighting model. It load medical image series as volume data, then creates 3D texture from the volume data, then setups and renders […]

May, 3

Acceleration of large-scale FDTD simulations on high performance GPU clusters

In this paper,a scalable graphics processing unit (GPU) cluster solution for the acceleration of FDTD for large-scale simulations is proposed. The hardware and software implementations are described. To illustrate the speed performance of the cluster, the simulation results of a cubic resonator with PEC boundaries is presented. A realistic large-scale simulation performed using SEMCAD X […]

May, 3

Exploiting GPU On-chip Shared Memory for Accelerating Schedulability Analysis

Embedded electronic devices like mobile phones and automotive control units must perform under strict timing constraints. As such, schedulability analysis constitutes an important phase of the design cycle of these devices. Unfortunately, schedulability analysis for most realistic task models turn out to be computationally intractable (NP-hard). Naturally, in the recent past, different techniques have been […]

CUDA

May, 3

GPU based acceleration architecture for image enhancement in spatial domain

In order to reduce the processing time of image enhancement in spatial domain, a GPU (Graphic Processing Unit) based acceleration architecture is proposed and implemented. With structured design method, computing model, data and algorithm resource which are indispensability in GPU computation are encapsulated, and computed directly in high performance with CUDA (Compute Unified Device Architecture). […]

CUDA

May, 1

Implementation of a 3GPP LTE turbo decoder accelerator on GPU

This paper presents a 3GPP LTE compliant turbo decoder accelerator on GPU. The challenge of implementing a turbo decoder is finding an efficient mapping of the decoder algorithm on GPU, e.g. finding a good way to parallelize workload across cores and allocate and use fast on-die memory to improve throughput. In our implementation, we increase […]

May, 1

Sparse Matrix Formats Evaluation and Optimization on a GPU

The data parallel programming model comes back with massive multicore architectures. The GPU is one of these and offers important possibilities to accelerate linear algebra. However, the irregular structure of sparse matrix operations generates problems with this programming model to obtain efficient performance. This depends on the used format to store values and the matrix […]

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating BP Neural Network-Based Image Compression by CPU and GPU Cooperation

Statistical power modeling of GPU kernels using performance counters

GPU-based elastic-object deformation for enhancement of existing haptic applications

A Framework of Large-Scale Terrain Visualization Based on GPU

Realtime Simulation of Burning Solids on GPU with CUDA

GPU-CPU multi-core for real-time signal processing

Real-time Medical Image Volume Rendering Based on GPU Accelerated Method

Acceleration of large-scale FDTD simulations on high performance GPU clusters

Exploiting GPU On-chip Shared Memory for Accelerating Schedulability Analysis

GPU based acceleration architecture for image enhancement in spatial domain

Implementation of a 3GPP LTE turbo decoder accelerator on GPU

Sparse Matrix Formats Evaluation and Optimization on a GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)