high performance computing on graphics processing units: hgpu.org

Posts

Nov, 27

Evenly Spaced Streamlines for Surfaces: An Image-Based Approach

Abstract We introduce a novel, automatic streamline seeding algorithm for vector fields defined on surfaces in 3D space. The algorithm generates evenly spaced streamlines fast, simply and efficiently for any general surface-based vector field. It is general because it handles large, complex, unstructured, adaptive resolution grids with holes and discontinuities, does not require a parametrization, […]

Nov, 27

Accelerating error correction in high-throughput short-read DNA sequencing data with CUDA

Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de-novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, […]

CUDA

Nov, 27

Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA

Computing large multiple protein sequence alignments using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. ClustalW uses a three-stage processing pipeline: (i) pairwise distance computation; (ii) phylogenetic tree reconstruction; and (iii) progressive multiple alignment computation. Previous work on accelerating ClustalW was mainly focused on parallelizing the first stage and achieved good […]

CUDA

Nov, 27

Towards Accelerated Computation of Atmospheric Equations Using CUDA

Main objective of this paper is to outline possibleways how to achieve a substantial acceleration in caseof advection-diffusion equation (A-DE) calculation,which is commonly used for a description of thepollutant behavior in atmosphere. A-DE is a kind ofpartial differential equation (PDE) and in general caseit is usually solved by numerical integration due to itshigh complexity. These […]

CUDA

Nov, 27

Boids that see: Using self-occlusion for simulating large groups on GPUs

Behavioral models have been used in the entertainment industry to increase the realism in the simulation of large groups of individuals. Unfortunately, the classical models can be very compute-intensive when very large groups are considered, reducing its applicability in games and other interactive systems. In this article we explore both search space reduction and parallelism […]

CUDA

Nov, 27

Hierarchical Markov Random Fields Applied to Model Soft Tissue Deformations on Graphics Hardware

Many methodologies dealing with prediction or simulation of soft tissue deformations on medical image data require preprocessing of the data in order to produce a different shape representation that complies with standard methodologies, such as mass-spring networks, finite element method s (FEM). On the other hand, methodologies working directly on the image space normally do […]

OpenGL

Nov, 27

An emotionally biased ant colony algorithm for pathfinding in games

Pathfinding is one of the tasks, apart from graphics rendering, requiring most CPU resources. Although there are many approaches to effectively solve pathfinding problems, they are becoming less suitable as more and more games have larger game worlds that dynamically change during the game play. These new games have more visually realistic graphics that increase […]

Nov, 27

Particle-Based Multiple Irregular Volume Rendering on CUDA

In this paper, we describe an improved particle-based volume rendering (PBVR) technique for previewing a large irregular volume dataset using the CUDA architecture. This technique allows for opaque and emissive particles to render translucent volumes without visibility sorting. Our GPU acceleration of PBVR provides the multi-volume rendering feature while remaining compatible with both regular and […]

CUDA

Nov, 27

Fast Conjugate Gradients with Multiple GPUs

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve […]

CUDA

Nov, 27

A Note on Auto-tuning GEMM for GPUs

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is especially true for Graphics Processing Units (GPUs), as evidenced by recently published results on DLA for GPUs that rely on highly optimized GEMM. However, the current best GEMM performance, e.g. […]

CUDA

Nov, 27

Efficient Parallelization of Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit

The small number of some reactant molecules in biological systems formed by living cells can result in dynamical behavior which cannot be captured by traditional deterministic models. In such a problem, a more accurate simulation can be obtained with discrete stochastic simulation (Gillespie’s stochastic simulation algorithm – SSA). Many stochastic realizations are required to capture […]

Nov, 27

Parallel View-Dependent Level-of-Detail Control

We present a scheme for view-dependent level-of-detail control that is implemented entirely on programmable graphics hardware. Our scheme selectively refines and coarsens an arbitrary triangle mesh at the granularity of individual vertices to create meshes that are highly adapted to dynamic view parameters. Such fine-grain control has previously been demonstrated using sequential CPU algorithms. However, […]

high performance computing on graphics processing units: hgpu.org

Posts

Evenly Spaced Streamlines for Surfaces: An Image-Based Approach

Accelerating error correction in high-throughput short-read DNA sequencing data with CUDA

Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA

Towards Accelerated Computation of Atmospheric Equations Using CUDA

Boids that see: Using self-occlusion for simulating large groups on GPUs

Hierarchical Markov Random Fields Applied to Model Soft Tissue Deformations on Graphics Hardware

An emotionally biased ant colony algorithm for pathfinding in games

Particle-Based Multiple Irregular Volume Rendering on CUDA

Fast Conjugate Gradients with Multiple GPUs

A Note on Auto-tuning GEMM for GPUs

Efficient Parallelization of Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit

Parallel View-Dependent Level-of-Detail Control

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)