high performance computing on graphics processing units: hgpu.org

Posts

Dec, 21

Efficient Probabilistic and Geometric Anatomical Mapping Using Particle Mesh Approximation on GPUs

Deformable image registration in the presence of considerable contrast differences and large size and shape changes presents significant research challenges. First, it requires a robust registration framework that does not depend on intensity measurements and can handle large nonlinear shape variations. Second, it involves the expensive computation of nonlinear deformations with high degrees of freedom. […]

CUDA

Dec, 21

A simulation suite for lattice Boltzmann based real time CFD applications exploiting multi-level parallelism on modern multi-and many-core architectures

We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published set of open-source libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise efficiency, we exploit all levels of parallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared […]

CUDA

Dec, 21

Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer […]

CUDA

Dec, 21

A Normalized Particle Swarm Optimization Algorithm to Price Complex Chooser Option and Accelerating its Performance with GPU

An option is a financial instrument which derives its value from an underlying asset. There are a wide range of options traded today. Some are simple and plain, like the European options, while others are very difficult to evaluate. Both buyers and sellers continue to look for efficient algorithms and faster technology to price options […]

CUDA

Dec, 21

Parallel Inference on Structured Data with CRFs on GPUs

Structured real world data can be represented with graphs whose structure encodes independence assumptions within the data. Due to statistical advantages over generative graphical models, Conditional Random Fields (CRFs) are used in a wide range of classification tasks on structured data sets. CRFs can be learned from both, fully or partially supervised data, and may […]

CUDA

Dec, 21

Generating SU(Nc) pure gauge lattice QCD configurations on GPUs with CUDA and OpenMP

The starting point of any lattice QCD computation is the generation of a Markov chain of gauge field configurations. Due to the large number of lattice links and due to the matrix multiplications, generating SU(Nc) lattice QCD configurations is a highly demanding computational task, requiring advanced computer parallel architectures such as clusters of several Central […]

CUDA

Dec, 21

Implementation of a Parallel Tree Method on a GPU

The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the computational complexity of these problems is reduced from O(N^2) for a brute force method to O(N log N) for […]

OpenCL

Dec, 20

Performance and Quality of Random Number Generators

Random number generation continues to be a critical component in much of computational science and the tradeoff between quality and computational performance is a key issue for many numerical simulations. We review the performance and statistical quality of some well known algorithms for generating pseudo random numbers. Graphical Processing Units (GPUs) are a powerful platform […]

CUDA

Dec, 20

GPU Accelerated PK-means Algorithm for Gene Clustering

In this paper, a novel GPU accelerated scheme for the PK-means gene clustering algorithm is proposed. According to the native particle-pair structure of the PKmeans algorithm, a fragment shader program is tailor-made to process a pair of particles in one pass for the computationintensive portion. As the output channel of a fragment consisting of 4 […]

CUDA

Dec, 20

A Framework for Genetic Algorithms in Parallel Environments

In this research, we developed a framework to execute genetic algorithms (GA) in various parallel environments. GA researchers can prepare implementations of GA operators and fitness functions using this framework. We have prepared several types of communication library in various parallel environments. Combining GA implementations and our libraries, GA researchers can benefit from parallel processing […]

CUDA

Dec, 20

Parallel Contour-Buildup Algorithm for the Molecular Surface

Molecular Dynamics simulations are an essential tool for many applications. The simulation of large molecules – like proteins – over long trajectories is of high importance e. g. for pharmaceutical, biochemical and medical research. For analyzing these data sets interactive visualization plays a crucial role as details of the interactions of molecules are often affected […]

CUDA

Dec, 20

Analysis of GPGPU Platforms Efficiency in General-Purpose Computations

Nowadays a technique of using graphics processing units (GPUs) for general-purpose computing (or GPGPU) is becoming more and more widespread. The goal of this paper is to analyze efficiency of computing with use of the GPGPU technique, depending on several factors. In this paper, there are analyzed differences in performance and platform organization between widespread […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient Probabilistic and Geometric Anatomical Mapping Using Particle Mesh Approximation on GPUs

A simulation suite for lattice Boltzmann based real time CFD applications exploiting multi-level parallelism on modern multi-and many-core architectures

Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

A Normalized Particle Swarm Optimization Algorithm to Price Complex Chooser Option and Accelerating its Performance with GPU

Parallel Inference on Structured Data with CRFs on GPUs

Generating SU(Nc) pure gauge lattice QCD configurations on GPUs with CUDA and OpenMP

Implementation of a Parallel Tree Method on a GPU

Performance and Quality of Random Number Generators

GPU Accelerated PK-means Algorithm for Gene Clustering

A Framework for Genetic Algorithms in Parallel Environments

Parallel Contour-Buildup Algorithm for the Molecular Surface

Analysis of GPGPU Platforms Efficiency in General-Purpose Computations

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)