high performance computing on graphics processing units: hgpu.org

Posts

Apr, 5

Enhancing Performance of Simulations using GPGPU

General Purpose GPU computing, or GPGPU, is the use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing. The model for GPU computing is to use a CPU and GPU together in a heterogeneous co-processing computing platform. The sequential part of the application runs on the CPU and the computationally-intensive […]

CUDA

Apr, 5

Image Processing on Graphical Processing Units for faster DNA Sequencing

Next generation DNA sequencing technologies generate terabytes of image data in a typical run over several days. Compute power to process the increasing amount of image data is becoming a problem in next generation sequencing. We propose to use the compute power of Graphical Processing Units (GPUs) to address this problem. GPUs have an efficient […]

CUDA

Apr, 4

A High-Performance Brownian Bridge for GPUs: Lessons for Bandwidth Bound Applications

We present a very exible Brownian bridge generator together with a GPU implementation which achieves close to peak performance on an NVIDIA C2050. The performance is compared with an OpenMP implementation run on several high performance x86-64 systems. The GPU shows a performance gain of at least 10x. Full comparative results are given in Section […]

CUDA

Apr, 4

Depth Estimation using Open Compute Language (OpenCL)

3D Video and related technologies like view synthesis, 2D-3D video conversions rely heavily on depth/disparity maps extracted from stereo video content. Innovative Segment-based depth map extraction chain from stereo video content was proposed in [1] giving good trade-off between quality (exactness to the ground truth) and computational complexity. We accelerated this work further by ~150%, […]

OpenCL

Apr, 4

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

GPU-to-CPU translation may extend Graphics Processing Units (GPU) programs executions to multi-/many-core CPUs, and hence enable cross-device task migration and promote whole-system synergy. This paper describes some of our findings in treatment to GPU synchronizations during the translation process. We show that careful dependence analysis may allow a fine-grained treatment to synchronizations and reveal redundant […]

CUDA

Apr, 4

Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices

Jacobi algorithm for Karhunen-Loeve transform of a symmetric real matrix, and its parallel implementation using chess tournament algorithm are revisited in this paper. Impact of memory access patterns and significance of memory coalescing on the performance of the GPU implementation for the parallel Jacobi algorithm are emphasized. Two novel memory access methods for the Jacobi […]

CUDA

Apr, 4

Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs

With the use of belief propagation (BP) decoding algorithm, low-density parity-check (LDPC) codes can achieve near-Shannon limit performance. LDPC codes can accomplish bit error rates (BERs) as low as $10^{-15}$ even at a small bit-energy-to-noise-power-spectral-density ratio ($E_{b}/N_{0}$). In order to evaluate the error performance of LDPC codes, simulators running on central processing units (CPUs) are […]

CUDA

Apr, 3

Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering

This paper presents a new efficient algorithm for computing temporally consistent disparity maps from video footage. Our method is motivated by recent work [1] that achieves high quality stereo results by smoothing disparity costs with a fast edge-preserving filter. This previous approach was designed to work with single static image pairs and does not maintain […]

CUDA

Apr, 2

Towards Adaptive GPU Resource Management for Embedded Real-Time Systems

In this paper, we present two conceptual frameworks for GPU applications to adjust their task execution times based on total workload. These frameworks enable smart GPU resource management when many applications share GPU resources while the workloads of those applications vary. Application developers can explicitly adjust the number of GPU cores depending on their needs. […]

CUDA

Apr, 2

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements […]

CUDA

Apr, 2

On the Cryptanalysis of Public-Key Cryptography

Nowadays, the most popular public-key cryptosystems are based on either the integer factorization or the discrete logarithm problem. The feasibility of solving these mathematical problems in practice are studied and techniques are presented to speed-up the underlying arithmetic on parallel architectures. The fastest known approach to solve the discrete logarithm problem in groups of elliptic […]

CUDA

Apr, 2

GPU Programming Strategies and Trends in GPU Computing

Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs) for nongraphics applications. From early academic proof-of-concept papers around the year 2000, the use of GPUs has now matured to a point where there are countless industrial applications. Together with the expanding use of GPUs, we have […]

high performance computing on graphics processing units: hgpu.org

Posts

Enhancing Performance of Simulations using GPGPU

Image Processing on Graphical Processing Units for faster DNA Sequencing

A High-Performance Brownian Bridge for GPUs: Lessons for Bandwidth Bound Applications

Depth Estimation using Open Compute Language (OpenCL)

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices

Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs

Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering

Towards Adaptive GPU Resource Management for Embedded Real-Time Systems

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

On the Cryptanalysis of Public-Key Cryptography

GPU Programming Strategies and Trends in GPU Computing

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)