high performance computing on graphics processing units: hgpu.org

Posts

Jun, 9

Improvement Study of EEMD Decomposition Efficiency Based on CUDA Architecture

EEMD can inhibit mode mixing, which may occur in EMD, EEMD is a technology of adding many groups of white noise to original signal to do assisted analysis on the basis of EMD, however, it will greatly reduce the decomposition efficiency of the signal. In order to eliminate the effects of mode mixing, and improve […]

CUDA

Jun, 9

Efficient all-against-all protein similarity matrix computation using OpenCL

In this report we introduced CLSW, a fast GPU-based Smith-Waterman score-only-alignment calculator. While generally applicable for any protein alignment problem, it was designed specifically as a proof-of-concept application for SIMAP. Even if we had only two weeks to develop a fully functional, validated and optimized implementation and all related concepts, our results show that in […]

OpenCL

Jun, 9

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

Intense computations in engineering and science, especially bioinformatics have been made practical by the recent advances in Graphical Processing Unit (GPU) computing technology. In this study, implementation and performance evaluations for a GPU-accelerated dynamic functional connectivity (DFC) analysis, which is an analysis method for investigating dynamic interactions among different brain networks, is presented. Open Computing […]

OpenCL

Jun, 9

3D Skeleton Extraction Method using Potential Field on OpenCL

For 3D skeleton extraction, the algorithm based on generalized potential fields, known as the outstandingly flexible and robust method, is suffering from seriously heavy computational burden. In this paper, we put forward a parallel algorithm based on OpenCL heterogeneous parallel framework, which can make full use of the great computing power provided by heterogeneous model […]

OpenCL

Jun, 9

Multi-level parallelization for hybrid ACO

The Graphics-Processing-Unit (GPU) became one of the main platforms to design massively parallel metaheuristics. This advance is due to the highly parallel architecture of GPU and especially thanks to the publication of languages like CUDA. In this paper, we deal with a multilevel parallel hybrid Ant System (AS) to solve the Travelling Salesman Problem (TSP). […]

CUDA

Jun, 8

The Performance Analysis Based on Heterogeneous Parallel Processors for Anisotropic Diffusion Filters

A noise in digital image degrades the performance of image processing. These images are most often used in medical field for diagnosis and treatment. Thus, there is a huge demand for high quality images from the medical field. The current algorithms to process useable images are derived using Gaussian blur filter. However, using such isotropic […]

CUDA

Jun, 8

Native Offload of Haskell Repa Programs to GPGPU

In light of recent hardware advances, General Purpose Graphics Processing Units (GPGPUs) are becoming increasingly commonplace, and demand novel programming models to account for their radically different architecture. For the most part, existing approaches to programming GPGPUs within a high-level programming language choose to embed a domain specific language (DSL) within a host metalanguage and […]

OpenCL

Jun, 8

A numerical tour of wave propagation

This tutorial is written for beginners as an introduction to basic wave propagation using nite dierence method, from acoustic and elastic wave modeling, to reverse time migration and full waveform inversion. Most of the theoretical delineations summarized in this tutorial have been implemented in Madagascar with Matlab, C and CUDA programming, which will benet readers’ […]

CUDA

Jun, 8

Efficient 3D Isotropic Volume Reconstruction Based On 2D Localized Ultrasound Images

A miniature 3D tracked ultrasonic probe has been developed to acquire intra-articular cartilage images under arthroscopic surgical conditions. The aim is to detect cartilaginous lesions (arthritis) and quantify their precise sizes and locations to help the clinician in his diagnostic and his therapeutic decision making. The ultrasonic transducer is tracked by an optical sensor, which […]

CUDA

Jun, 8

Review and Comparative Study of Ray Traversal Algorithms on a Modern GPU Architecture

In this paper we present a chronological review of five distinct data structures commonly found in literature and ray tracing systems: Bounding Volume Hierarchies (BVH), Octrees, Uniform Grids, KD-Trees, and Bounding Interval Hierarchies (BIH). This review is then followed by an extensive comparative study of six different ray traversal algorithms implemented on a modern Kepler […]

CUDA

Jun, 7

International Conference on VLSI Systems, Architecture, Technology and Applications, VLSI-SATA 2015

International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA 2015) will be held at the Amrita Viswa Vidyapeetham (University), School of Engineering, Bengaluru Campus during January 8-10, 2015. The conference will serve as an annual forum for researchers, academicians, and practitioners from around the world to present their current theoretical research efforts, system and […]

Jun, 7

Implementation and Experimental Evaluation of a CUDA Core under Single Event Effects

Graphic Processing Units have become popular in a broad range of applications due to their high computational power and low prices. Among the applications are the safety critical ones, where fault tolerance is mandatory. This paper presents the implementation of a CUDA core, the main processing core of a GPU and its evaluation under Single […]

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

high performance computing on graphics processing units: hgpu.org

Posts

Improvement Study of EEMD Decomposition Efficiency Based on CUDA Architecture

Efficient all-against-all protein similarity matrix computation using OpenCL

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

3D Skeleton Extraction Method using Potential Field on OpenCL

Multi-level parallelization for hybrid ACO

The Performance Analysis Based on Heterogeneous Parallel Processors for Anisotropic Diffusion Filters

Native Offload of Haskell Repa Programs to GPGPU

A numerical tour of wave propagation

Efficient 3D Isotropic Volume Reconstruction Based On 2D Localized Ultrasound Images

Review and Comparative Study of Ray Traversal Algorithms on a Modern GPU Architecture

International Conference on VLSI Systems, Architecture, Technology and Applications, VLSI-SATA 2015

Implementation and Experimental Evaluation of a CUDA Core under Single Event Effects

Recent source codes

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Most viewed papers (last 30 days)