high performance computing on graphics processing units: hgpu.org

Posts

Oct, 30

GPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method

We propose a numerical implementation based on a Graphics Processing Unit (GPU) for the acceleration of the execution time of the Lattice Boltzmann Method (LBM). The study focuses on the application of the LBM for patient-specific blood flow computations, and hence, to obtain higher accuracy, double precision computations are employed. The LBM specific operations are […]

CUDA

Oct, 30

GPU-Based Image Segmentation Using Level Set Method With Scaling Approach

In recent years, with the development of graphics processors, graphics cards have been widely used to perform general-purpose calculations. Especially with release of CUDA C programming languages in 2007, most of the researchers have been used CUDA C programming language for the processes which needs high performance computing. In this paper, a scaling approach for […]

CUDA

Oct, 30

An Evolutionary Approach to Parallel Computing Using GPU

A few years, the programmable graphics processor unit has evolved into an absolute High performance computing. Simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. A compiler and run time system that abstracts and virtualizes many aspects of graphics hardware. Commodity graphics hardware has rapidly evolved from being a fixed-function pipeline […]

CUDA

Oct, 30

The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support

Recent increases in supercomputing power, driven by the multi-core revolution and accelerators such as the IBM Cell processor, graphics processing units (GPUs) and Intel’s Many Integrated Core (MIC) technology have enabled kinetic simulations of plasmas at unprecedented resolutions, but changing HPC architectures also come with challenges for writing efficient numerical codes. This paper describes the […]

CUDA

Oct, 30

Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL

The heterogeneous computing platform with the tremendous raw capacity can be easily constructed with the availability of multi-core processors, high capacitive FPGAs and GPUs which can include any number of these computing units. However, challenge faced until now was the lack of a standardized framework under which the computational tasks and data of applications could […]

OpenCL

Oct, 29

Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation

The introduction of prior knowledge into image analysis algorithms is a central challenge in computer vision. In this paper, we introduce the concept of proximity priors into semantic segmentation methods in order to penalize the proximity of certain object classes. Proximity priors are a generalization of purely global and purely local co-occurrence priors which have […]

CUDA

Oct, 29

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

For detecting potential problems of a cutter path, cutting force simulation in the NC milling process is necessary prior to actual machining. A milling operation is geometrically equivalent to a Boolean subtraction of the swept volume of a cutter moving along a path from a solid model representing the stock shape. In order to precisely […]

CUDA

Oct, 29

GPU-Mapping: Robotic Map Building with Graphical Multiprocessors

This paper provides a wide perspective of the potential applicability of Graphical Processing Units (GPUs) computing power in robotics, specifically in the well known problem of 2D robotic mapping. There are three possible ways of exploiting these massively parallel devices: I) parallelizing existing algorithms, II) integrating already existing parallelized general purpose software, and III) making […]

CUDA

Oct, 29

First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

Recent innovations focused around parallel processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA’s Tesla Graphics Processing […]

CUDA

Oct, 29

Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linkopings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, […]

CUDA

•

OpenCL

Oct, 29

High-performance Dynamic Programming on FPGAs with OpenCL

Field programmable gate arrays (FPGAs) provide reconfigurable computing fabrics that can be tailored to a wide range of time and power sensitive applications. Traditionally, programming FPGAs required an expertise in complex hardware description languages (HDLs) or proprietary high-level synthesis (HLS) tools. Recently, Altera released the worlds first OpenCL conformant SDK for FPGAs. OpenCL is an […]

OpenCL

Oct, 29

Molecular Simulations using CUDA

Computer simulations play a vital role in understanding the phase behavior of colloidal dispersions, however, most simulation results suffer from finite-size effects. These finite-size effects can be eliminated by finite-size scaling or by simulating large system sizes. In this thesis we show how to simulate large system sizes efficiently on Graphical Processing Units (GPUs). Whereas […]

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

high performance computing on graphics processing units: hgpu.org

Posts

GPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method

GPU-Based Image Segmentation Using Level Set Method With Scaling Approach

An Evolutionary Approach to Parallel Computing Using GPU

The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support

Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL

Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

GPU-Mapping: Robotic Map Building with Graphical Multiprocessors

First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

Extension of the SkePU Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

High-performance Dynamic Programming on FPGAs with OpenCL

Molecular Simulations using CUDA

Recent source codes

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Most viewed papers (last 30 days)