high performance computing on graphics processing units: hgpu.org

Posts

Dec, 19

Implementation of 802.11n on 128-CORE Processor

This article presents the results of a research in applying modern Graphics Processing Units in the field of telecommunications. The most recent Wireless Local Area Network protocol, 802.11n, was studied, as it introduces a significant increase of computational complexity. Taking into consideration the concept of Software Defined Radio, the implementation of PHY algorithms was devised […]

CUDA

Dec, 19

GPU Acceleration of Particle-based Volume Rendering using CUDA

In this paper, we apply Particle-based Volume Rendering (PBVR) technique using a current programmable GPU architecture. Recently, the increasing programmability of GPU offers an efficient method of SIMD parallel algorithm to solve the speed problem. Due to the each point or pixel can be calculated independently, we use programmable graphics hardware to delegate all expensive […]

CUDA

Dec, 19

GPU-based parallelization for fast circuit optimization

The progress of GPU (Graphics Processing Unit) technology opens a new avenue for boosting computing power. This work is an attempt to exploit GPU for accelerating VLSI circuit optimization. We propose GPU-based parallel computing techniques and apply them on simultaneous gate sizing and threshold voltage assignment, which is often employed in practice for performance and […]

CUDA

Dec, 19

Particle-based volume rendering

In this paper, we introduce a novel point-based volume rendering technique based on tiny particles. In the proposed technique, a set of tiny opaque particles is generated from a given 3D scalar field based on a user-specified transfer function and the rejection method. The final image is then generated by projecting these particles onto the […]

Dec, 19

Hardware Accelerated Skin Deformation for Animated Crowds

Real time rendering of animated crowds has many practical multimedia applications. The Graphics Processor Unit (GPU) is being increasingly employed to accelerate associated rendering and deformation calculations. This paper explores skeletal deformation calculations on the GPU for crowds of articulated figures. It compares a few strategies for efficient reuse of such calculations on clones. We […]

Dec, 19

Speed Records for NTRU

In this paper NTRUEncrypt is implemented for the first time on a GPU using the CUDA platform. As is shown, this operation lends itself perfectly for parallelization and performs extremely well compared to similar security levels for ECC and RSA giving speedups of around three to five orders of magnitude. The focus is on achieving […]

CUDA

Dec, 18

Accelerating S3D: A GPGPU Case Study

The graphics processor (GPU) has evolved into an appealing choice for high performance computing due to its superior memory bandwidth, raw processing power, and flexible programmability. As such, GPUs represent an excellent platform for accelerating scientific applications. This paper explores a methodology for identifying applications which present significant potential for acceleration. In particular, this work […]

CUDA

Dec, 18

Accelerating Regularized Iterative CT Reconstruction on Commodity Graphics Hardware (GPU)

Iterative reconstruction algorithms augmented with regularization can produce high-quality reconstructions from few views and even in the presence of significant noise. In this paper we focus on the particularities associated with the GPU acceleration of these. First, we introduce the idea of using exhaustive benchmark tests to determine the optimal settings of various parameters in […]

OpenGL

Dec, 18

Long time-scale simulations of in vivo diffusion using GPU hardware

To address the problem of performing long time simulations of biochemical pathways under in vivo cellular conditions, we have developed a lattice-based, reaction-diffusion model that uses the graphics processing unit (GPU) as a computational co-processor. The method has been specifically designed from the beginning to take advantage of the GPU’s capacity to perform massively parallel […]

CUDA

Dec, 18

Large-scale FFT on GPU clusters

A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e. g. matrix multiplication and LINPACK) and bandwidth-intensive tasks with data locality (e. g. finite-difference simulation). Bandwidth-intensive tasks such as large-scale FFTs without data locality are harder to accelerate, as the bottleneck often lies with the PCI between […]

CUDA

Dec, 18

Shader Performance Analysis on a Modern GPU Architecture

This paper presents an analysis of the performance of the shader processing units in a modern graphics processor unit (GPU) architecture using real graphic applications. The architecture of a modern GPU is described and a simulator and associated framework used to evaluate the architecture is introduced. The paper analyses the effects in performance of different […]

OpenGL

Dec, 18

GPU clusters for high-performance computing

Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Implementation of 802.11n on 128-CORE Processor

GPU Acceleration of Particle-based Volume Rendering using CUDA

GPU-based parallelization for fast circuit optimization

Particle-based volume rendering

Hardware Accelerated Skin Deformation for Animated Crowds

Speed Records for NTRU

Accelerating S3D: A GPGPU Case Study

Accelerating Regularized Iterative CT Reconstruction on Commodity Graphics Hardware (GPU)

Long time-scale simulations of in vivo diffusion using GPU hardware

Large-scale FFT on GPU clusters

Shader Performance Analysis on a Modern GPU Architecture

GPU clusters for high-performance computing

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)