high performance computing on graphics processing units: hgpu.org

Posts

Jan, 8

Hybrid Algorithms for List Ranking and Graph Connected Components

The advent of multicore and many-core architectures saw them being deployed to speed-up computations across several disciplines and application areas. Prominent examples include semi-numerical algorithms such as sorting, graph algorithms, image processing, scientific computations, and the like. In particular, using GPUs for general purpose computations has attracted a lot of attention given that GPUs can […]

CUDA

Jan, 8

Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems

This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures – multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future highend supercomputers. To pave the road for their efficient use, we must resolve […]

CUDA

Jan, 8

Parameter Tuning of a Hybrid Treecode-FMM on GPUs

Treecodes are O(N log N) hierarchical N-body algorithms, which have traditionally been used for applications in astrophysics, in a low-accuracy regime. Fast multipole methods (FMM) are O(N) hierarchical N-body algorithms that have been used in a variety of applications, often in the high-accuracy regime. Both algorithms are known to perform well on massively parallel heterogeneous […]

CUDA

Jan, 8

A Quasi-Parallel GPU-Based Algorithm for Delaunay Edge-Flips

The Delaunay edge-flip algorithm is a practical method for transforming any existing triangular mesh S into a mesh T(S) that satisfies the Delaunay condition. Although several implementations of this algorithm are known, to the best of our knowledge no parallel GPU-based implementation has been reported yet. In the present work, we propose a quadriphasic and […]

CUDA

•

OpenGL

Jan, 8

Direct solution of the Boltzmann equation for a binary mixture on GPUs

We show how to accelerate the numerical solution of the Boltzmann equation for a binary gas mixture by using Graphics Processing Units (GPUs). In order to fully exploit the computational power of the GPU, we adopt a semi-regular method of solution which combines a finite difference discretization of the free-streaming term with a Monte Carlo […]

CUDA

Jan, 8

Massively Parallel Sequential Monte Carlo for Bayesian Inference

This paper reconsiders sequential Monte Carlo approaches to Bayesian inference in the light of massively parallel desktop computing capabilities now well within the reach of individual academics. It first develops an algorithm that is well suited to parallel computing in general and for which convergence results have been established in the sequential Monte Carlo literature […]

CUDA

Jan, 8

Some Graph Algorithms And Related Primitives For The GPU

General purpose computing on graphics processor units (GPGPU) has attained widespread acceptance in the high-performance computing community. This has largely been at- tributed to the rise of programming models and large peak performance to cost ratio of the GPU. The peak throughput of modern GPUs are typically 5 TFLOPS at a cost of 600 US […]

CUDA

Jan, 8

Implementation of Kd-Trees on the GPU to Achieve Real Time Graphics Processing

This paper examines the parallelization of ray tracing algorithms with the goal of running the whole process on the graphics processing unit (GPU) rather than the central processing unit (CPU). The motivation behind this endeavour is to utilize the massively parallel nature of the GPU. This parallelism allows the construction of 3-dimensional images to take […]

Jan, 8

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU […]

CUDA

Jan, 8

Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware

The block cipher Rijndael has undergone more than ten years of extensive cryptanalysis since its submission as a candidate for the Advanced Encryption Standard (AES) in April 1998. To date, most of the publicly-known cryptanalytic results are based on reduced-round variants of the AES (respectively Rijndael) algorithm. Among the few exceptions that target the full […]

CUDA

Jan, 7

Report on the Feasibility of Implementing PIC Codes on a GPU

GPUs have become a very attractive supplement to traditional high performance computing. GPUs have significantly better performance per cost and power consumption. However, GPUs introduce several additional levels of parallelism that must be contended with. New methods must be developed in order to take full advantage of the capabilities of this architecture. This paper explores […]

CUDA

Jan, 7

Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimize data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximizes parallelism and tries to hide access latencies. We apply […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Hybrid Algorithms for List Ranking and Graph Connected Components

Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems

Parameter Tuning of a Hybrid Treecode-FMM on GPUs

A Quasi-Parallel GPU-Based Algorithm for Delaunay Edge-Flips

Direct solution of the Boltzmann equation for a binary mixture on GPUs

Massively Parallel Sequential Monte Carlo for Bayesian Inference

Some Graph Algorithms And Related Primitives For The GPU

Implementation of Kd-Trees on the GPU to Achieve Real Time Graphics Processing

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware

Report on the Feasibility of Implementing PIC Codes on a GPU

Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)