high performance computing on graphics processing units: hgpu.org

Posts

Nov, 27

Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA

Computing large multiple protein sequence alignments using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. ClustalW uses a three-stage processing pipeline: (i) pairwise distance computation; (ii) phylogenetic tree reconstruction; and (iii) progressive multiple alignment computation. Previous work on accelerating ClustalW was mainly focused on parallelizing the first stage and achieved good […]

CUDA

Nov, 27

Towards Accelerated Computation of Atmospheric Equations Using CUDA

Main objective of this paper is to outline possibleways how to achieve a substantial acceleration in caseof advection-diffusion equation (A-DE) calculation,which is commonly used for a description of thepollutant behavior in atmosphere. A-DE is a kind ofpartial differential equation (PDE) and in general caseit is usually solved by numerical integration due to itshigh complexity. These […]

CUDA

Nov, 27

Boids that see: Using self-occlusion for simulating large groups on GPUs

Behavioral models have been used in the entertainment industry to increase the realism in the simulation of large groups of individuals. Unfortunately, the classical models can be very compute-intensive when very large groups are considered, reducing its applicability in games and other interactive systems. In this article we explore both search space reduction and parallelism […]

CUDA

Nov, 27

Hierarchical Markov Random Fields Applied to Model Soft Tissue Deformations on Graphics Hardware

Many methodologies dealing with prediction or simulation of soft tissue deformations on medical image data require preprocessing of the data in order to produce a different shape representation that complies with standard methodologies, such as mass-spring networks, finite element method s (FEM). On the other hand, methodologies working directly on the image space normally do […]

OpenGL

Nov, 27

An emotionally biased ant colony algorithm for pathfinding in games

Pathfinding is one of the tasks, apart from graphics rendering, requiring most CPU resources. Although there are many approaches to effectively solve pathfinding problems, they are becoming less suitable as more and more games have larger game worlds that dynamically change during the game play. These new games have more visually realistic graphics that increase […]

Nov, 27

Particle-Based Multiple Irregular Volume Rendering on CUDA

In this paper, we describe an improved particle-based volume rendering (PBVR) technique for previewing a large irregular volume dataset using the CUDA architecture. This technique allows for opaque and emissive particles to render translucent volumes without visibility sorting. Our GPU acceleration of PBVR provides the multi-volume rendering feature while remaining compatible with both regular and […]

CUDA

Nov, 27

Fast Conjugate Gradients with Multiple GPUs

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve […]

CUDA

Nov, 27

A Note on Auto-tuning GEMM for GPUs

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is especially true for Graphics Processing Units (GPUs), as evidenced by recently published results on DLA for GPUs that rely on highly optimized GEMM. However, the current best GEMM performance, e.g. […]

CUDA

Nov, 27

Efficient Parallelization of Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit

The small number of some reactant molecules in biological systems formed by living cells can result in dynamical behavior which cannot be captured by traditional deterministic models. In such a problem, a more accurate simulation can be obtained with discrete stochastic simulation (Gillespie’s stochastic simulation algorithm – SSA). Many stochastic realizations are required to capture […]

Nov, 27

Parallel View-Dependent Level-of-Detail Control

We present a scheme for view-dependent level-of-detail control that is implemented entirely on programmable graphics hardware. Our scheme selectively refines and coarsens an arbitrary triangle mesh at the granularity of individual vertices to create meshes that are highly adapted to dynamic view parameters. Such fine-grain control has previously been demonstrated using sequential CPU algorithms. However, […]

Nov, 27

Real-time virtual environment signal extraction and denoising using programmable graphics hardware

Abstract The sense of being within a three-dimensional (3D) space and interacting with virtual 3D objects in a computer-generated virtual environment (VE) often requires essential image, vision and sensor signal processing techniques such as differentiating and denoising. This paper describes novel implementations of the Gaussian filtering for characteristic signal extraction and wavelet-based image denoising algorithms […]

OpenGL

Nov, 27

Implications of the Turing completeness of reaction-diffusion models, informed by GPGPU simulations on an XBox 360: cardiac arrhythmias, re-entry and the Halting problem

In the arsenal of tools that a computational modeller can bring to bare on the study of cardiac arrhythmias, the most widely used and arguably the most successful is that of an excitable medium, a special case of a reaction-diffusion model. These are used to simulate the internal chemical reactions of a cardiac cell and […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA

Towards Accelerated Computation of Atmospheric Equations Using CUDA

Boids that see: Using self-occlusion for simulating large groups on GPUs

Hierarchical Markov Random Fields Applied to Model Soft Tissue Deformations on Graphics Hardware

An emotionally biased ant colony algorithm for pathfinding in games

Particle-Based Multiple Irregular Volume Rendering on CUDA

Fast Conjugate Gradients with Multiple GPUs

A Note on Auto-tuning GEMM for GPUs

Efficient Parallelization of Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit

Parallel View-Dependent Level-of-Detail Control

Real-time virtual environment signal extraction and denoising using programmable graphics hardware

Implications of the Turing completeness of reaction-diffusion models, informed by GPGPU simulations on an XBox 360: cardiac arrhythmias, re-entry and the Halting problem

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)