high performance computing on graphics processing units: hgpu.org

Posts

Apr, 1

Parallelization of the Cuckoo Search using CUDA Architecture

Cuckoo Search is one of the recent swarm itelligence metaheuritics. It has been succesfuly applied to a number of optimization problems but is stil not very well researched. In this paper we present a parallelized version of the Cuckoo Search algorithm. The parallelization is implemented using CUDA architecture. The algorithm is significantly changed compared to […]

CUDA

Apr, 1

OpenCL parallel Processing using General Purpose Graphical Processing units – TiViPE software development

The aim of this report to elaborate TiViPE modules that make use of Open Computing Language (OpenCL) programming. OpenCL is available in TiViPE from version 2.1.0. The aim of TiViPE is to integrate different technologies in a seamless way using graphical icons [1]. Due to these icons the user does not need to have in […]

CUDA

•

OpenCL

Apr, 1

Geometric Algebra Computing Technology for Accelerated Processing Units

Development on embedded devices, even on today’s hardware, limits us to a minimum of third party-library dependencies due to hardware memory and power restrictions. In setups requiring intense geometric operations on limited hardware, such as in robotics, this problem can often lead to a tedious reimplementation of matrix, vector, and quaternion operations. Furthermore, certain unnecessary […]

CUDA

•

OpenCL

Apr, 1

A Discussion of Selected Vienna-Libraries for Computational Science

We address the low popularity of C++ in computational science by introducing a set of orthogonal libraries: The CUDA-, OpenCL-, and OpenMP-enabled linear algebra library ViennaCL, the mesh datastructure library ViennaGrid, a data storage facility named ViennaData, and the symbolic math kernel ViennaMath. Finally, we discuss how these orthogonal components interact within the finite element […]

CUDA

•

OpenCL

Mar, 31

Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding

GPU based computing has made significant strides in recent years. Unfortunately, GPU program optimizations can introduce subtle concurrency errors, and so incisive formal bug-hunting methods are essential. This paper presents a new formal bug-hunting method for GPU programs that combine barriers and atomics. We present an algorithm called conflict-directed delay-bounded scheduling algorithm (CD) that exploits […]

CUDA

Mar, 31

Specification and Verification of GPGPU Programs using Permission-Based Separation Logic

Graphics Processing Units (GPUs) are increasingly used for general-purpose applications because of their low price, energy efficiency and enormous computing power. Considering the importance of GPU applications, it is vital that the behaviour of GPU programs can be specified and proven correct formally. This paper presents our ideas how to verify GPU programs written in […]

OpenCL

Mar, 31

A journey from single-GPU to optimized multi-GPU SPH with CUDA

We present an optimized multi-GPU version of GPUSPH, a CUDA implementation of fluid-dynamics models based on the Smoothed Particle Hydrodynamics (SPH) numerical method. SPH is a well-known Lagrangian model for the simulation of free-surface fluid flows; it exposes a high degree of parallelism and has already been successfully ported to GPU. We extend the GPU-based […]

CUDA

Mar, 31

A Massively Parallel Associative Memory Based on Sparse Neural Networks

Associative memories store content in such a way that the content can be later retrieved by presenting the memory with a small portion of the content, rather than presenting the memory with an address as in more traditional memories. Associative memories are used as building blocks for algorithms within database engines, anomaly detection systems, compression […]

CUDA

Mar, 31

CMCpy: Genetic Code-Message Coevolution Models in Python

Code-message coevolution (CMC) models represent coevolution of a genetic code and a population of protein-coding genes ("messages"). Formally, CMC models are sets of quasispecies coupled together for fitness through a shared genetic code. Although CMC models display plausible explanations for the origin of multiple genetic code traits by natural selection, useful modern implementations of CMC […]

CUDA

Mar, 29

High Performance Computing using GPGPU’s

Computer based simulation software having a basis in numerical methods play a major role in research in the area of natural and physical sciences. These tools allow scientists to attempt problems that are too large to solve using analytical methods. But even these tools can fail to give solutions due to computational or storage limits. […]

OpenCL

Mar, 29

Warp Size Impact in GPUs: Large or Small?

There are a number of design decisions that impact a GPU’s performance. Among such decisions deciding the right warp size can deeply influence the rest of the design. Small warps reduce the performance penalty associated with branch divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also […]

Mar, 29

Graphics Processing Unit Acceleration of the Explicit Solution of the Time Domain Volume Integral Equation Using OpenACC

A graphics processing unit (GPU) accelerated implementation of the explicit solution of the time domain volume integral equation (TD-VIE) using the OpenACC application program interface (API) is presented. The use of the OpenACC API, which is based on a collection of compiler directives implementation, allows for the ease of porting as well as the efficient […]

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

high performance computing on graphics processing units: hgpu.org

Posts

Parallelization of the Cuckoo Search using CUDA Architecture

OpenCL parallel Processing using General Purpose Graphical Processing units – TiViPE software development

Geometric Algebra Computing Technology for Accelerated Processing Units

A Discussion of Selected Vienna-Libraries for Computational Science

Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding

Specification and Verification of GPGPU Programs using Permission-Based Separation Logic

A journey from single-GPU to optimized multi-GPU SPH with CUDA

A Massively Parallel Associative Memory Based on Sparse Neural Networks

CMCpy: Genetic Code-Message Coevolution Models in Python

High Performance Computing using GPGPU’s

Warp Size Impact in GPUs: Large or Small?

Graphics Processing Unit Acceleration of the Explicit Solution of the Time Domain Volume Integral Equation Using OpenACC

Recent source codes

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Most viewed papers (last 30 days)