high performance computing on graphics processing units: hgpu.org

Posts

Sep, 7

High Performance Parallel Implementation of Compressive Sensing SAR Imaging

The compressive sensing (CS) theory has been applied to SAR imaging systems in many ways. And it shows a significant reduction in the amount of sampling data at the cost of much longer reconstruction time. In this paper, we investigate the development and optimization of Iterative Shrinkage/Thresholding (IST) algorithm applying to CS reconstruction of SAR […]

CUDA

Sep, 7

Multi-Moment Methods for PDEs and GPUs for Large-Scale Scientific Computations

The scope of this thesis is broad but focuses on developing effective numerical methods and efficient implementations. We investigate numerical solution methods for hyperbolic partial differential equations, numerical optimization methods, and implementation of fast numerical algorithms on graphics processor units (GPUs). For partial differential equations we develop numerical methods for the transport and advection equations. […]

CUDA

Sep, 7

An Efficient Acceleration of Digital Fonensics Search Using GPGPU

Graphics Processing Units (GPU) have been the extensive research topic in recent years and have been successfully applied to general purpose applications other than computer graphical area. The nVidia CUDA programming model provides a straightforward means of describing inherently parallel computations. In this paper, we present a study of the efficiency of emerging technology in […]

CUDA

Sep, 7

GPU Computing and CUDA technology used to accelerate a mesh generator application

The potential of GPU computing used in general purpose parallel programming has been amply shown. These massively parallel many-core multiprocessors are available to any users in every PCs, notebook, game console or workstation. In this work, we present the parallel version of a mesh-generating algorithm and its execution time reduction by using off-the-shelf GPU technology. […]

CUDA

Sep, 6

Power and Performance Analysis of GPU-Accelerated Systems

Graphics processing units (GPUs) provide significant improvements in performance and performance-perwatt as compared to traditional multicore CPUs. This energy-efficiency of GPUs has facilitated the use of GPUs in many application domains. Albeit energy efficient, GPUs consume non-trivial power independently of CPUs. Therefore, we need to analyze the power and performance characteristic of GPUs and their […]

CUDA

Sep, 6

GPU-Accelerated First-Order Scattering Simulation for X-Ray CT Image Reconstruction

In recent years the GPU has become an increasingly popular tool in various fields. In this paper, we will introduce our preliminary work on first-order scatter simulation in X-ray imaging accelerated by GPU. As this is preliminary work, we explore the GPU accelerated scattering simulation in 2D space and test it with physics-based simulated data. […]

CUDA

Sep, 6

GPU Acceleration of BCP Procedure for SAT Algorithms

The satisfiability problem (SAT) is widely applicable and one of the most basic NP-complete problems. This problem has been required to be solved as fast as possible because of its significance, but it takes exponential time in the worst case to solve. Therefore, we aim to save the computation time by parallel computing on a […]

CUDA

Sep, 6

Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking

Emergence of new Graphical Processors for general purpose computing presents new challenges for application developers. Graphical Processors vary in terms of number of processor cores per chip, processor speed and memory subsystems. NVIDIA’s CUDA provides a C-like abstraction layer for software developers to implement their applications on GPUs often with little knowledge of the underlying […]

CUDA

Sep, 6

CudaGIS: Report on the Design and Realization of a Massive Data Parallel GIS on GPUs

We report the design and realization of a highperformance parallel GIS, i.e., CudaGIS, based on the General Purpose computing on Graphics Processing Units (GPGPU) technologies. Still under active developments, CudaGIS currently supports major types of geospatial data (point, polyline, polygon and raster) and provides modules for spatial indexing, spatial join and other types of geospatial […]

CUDA

Sep, 5

GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

BACKGROUND: The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on […]

CUDA

Sep, 5

Multi-user real-time speech recognition with a GPU

We have developed a multi-user large vocabulary speech recognition system employing a fully composed one-level weighted finite state transducer (WFST) based network on a Graphics Processing Unit (GPU). This system improves the overall throughput and latency of speech recognition engine which processes multiple users’ utterances at the same time with efficient scheduling, parameter sharing, and […]

CUDA

Sep, 5

Accelerating and Characterizing Seam Carving Using a Heterogeneous CPU-GPU System

Seam carving has been widely used for contentaware resizing of images and videos with little to no perceptible distortion. Unfortunately, for high-resolution videos and large images it becomes computationally unfeasible to do the resizing in real-time using small-scale CPU systems. In this paper, we exploit the highly parallel computational capabilities of CUDA-enabled Graphics Processing Units […]

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

high performance computing on graphics processing units: hgpu.org

Posts

High Performance Parallel Implementation of Compressive Sensing SAR Imaging

Multi-Moment Methods for PDEs and GPUs for Large-Scale Scientific Computations

An Efficient Acceleration of Digital Fonensics Search Using GPGPU

GPU Computing and CUDA technology used to accelerate a mesh generator application

Power and Performance Analysis of GPU-Accelerated Systems

GPU-Accelerated First-Order Scattering Simulation for X-Ray CT Image Reconstruction

GPU Acceleration of BCP Procedure for SAT Algorithms

Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking

CudaGIS: Report on the Design and Realization of a Massive Data Parallel GIS on GPUs

GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

Multi-user real-time speech recognition with a GPU

Accelerating and Characterizing Seam Carving Using a Heterogeneous CPU-GPU System

Recent source codes

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Most viewed papers (last 30 days)