high performance computing on graphics processing units: hgpu.org

Posts

Mar, 12

Reduced Vlasov-Maxwell simulations

In this paper we review two different numerical methods for Vlasov-Maxwell simulations. The first method is based on a coupling between a Discontinuous Galerkin (DG) Maxwell solver and a Particle-In-Cell (PIC) Vlasov solver. The second method only uses a DG approach for the Vlasov and Maxwell equations. The Vlasov equation is first reduced to a […]

OpenCL

Mar, 12

Genetically Improved CUDA kernels for StereoCamera

Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]

CUDA

Mar, 12

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

We present a performance analysis of a parallel implementation of both conjugate gradient and preconditioned conjugate gradient solvers using graphic processing units with CUDA parallel programming model. The solvers were optimized for a fast solution of sparse systems of equations arising from Finite Element Analysis (FEA) of electromagnetic phenomena. The preconditioners were Incomplete Cholesky factorization […]

CUDA

Mar, 12

MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence

Exact short read mapping to whole genomes with the Smith-Waterman algorithm is computationally expensive yet highly accurate when aligning reads with mismatches and gaps. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to mainstream approaches MaxSSmap identifies a local region of the […]

CUDA

Mar, 10

OpenCL-Accelerated Simplified General Perturbations 4 Algorithm

The number of space objects such as satellites, spacecraft, and debris are increasing significantly, and so is the need for tracking them for security and collision avoidance purposes. In this context, as parallelism is becoming a new paradigm, the need of implementing high performance propagators remain unmet. For this, we implemented Simplified General Perturbations No. […]

OpenCL

Mar, 10

GPU-EvR: Run-time Event Based Real-time Scheduling Framework on GPGPU Platform

GPU architecture has traditionally been used in graphics application because of its enormous computing capability. Moreover, GPU architecture has also been used for general purpose computing in these days. Most of the current scheduling frameworks that are developed to handle GPGPU workload operate sequentially. This is problematic since this sequential approach may not be scalable […]

Mar, 10

Massively parallel read mapping on GPUs with PEANUT

We present PEANUT (ParallEl AligNment UTility), a highly parallel GPU-based read mapper with several distinguishing features, including a novel q-gram index (called the q-group index) with small memory footprint built on-the-fly over the reads and the possibility to output both the best hits or all hits of a read. Designing the algorithm particularly for the […]

OpenCL

Mar, 10

GPU Accelerated Discontinuous Galerkin Methods for Shallow Water Equations

We discuss the development, verification, and performance of a GPU accelerated discontinuous Galerkin method for the solutions of two dimensional nonlinear shallow water equations. The shallow water equations are hyperbolic partial differential equations and are widely used in the simulation of tsunami wave propagations. Our algorithms are tailored to take advantage of the single instruction […]

CUDA

•

OpenCL

Mar, 10

A GPU Accelerated Aggregation Algebraic Multigrid Method

We present an efficient, robust and fully GPU-accelerated aggregation-based algebraic multigrid preconditioning technique for the solution of large sparse linear systems. These linear systems arise from the discretization of elliptic PDEs. The method involves two stages, setup and solve. In the setup stage, hierarchical coarse grids are constructed through aggregation of the fine grid nodes. […]

CUDA

Mar, 9

XeonPhi Meets Astrophysical Fluid Dynamics

This white paper reports on ours efforts to optimize a 2D/3D astrophysical (magento-)hydrodynamics Fortran code for XeonPhi. The code is parallelized with OpenMP and is suitable for execution on a shared memory system. Due to complexity of the code combined with immaturity of compiler we were unable to stay within the boundaries of Intel Compiler […]

Mar, 9

Real-time video denoising for 2D ultrasound streaming video on GPUs

The ultrasound videos are mainly contaminated by multiplicative noises but also contaminated with additive noises. As the past few decades, there are some studies to remove the noises from ultrasound images as in the JY model [1] and the variational model which removes both types of noises. However, denoising these noises from the ultrasound video […]

CUDA

Mar, 9

RASR/NN: The RWTH Neural Network Toolkit for Speech Recognition

This paper describes the new release of RASR – the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The focus is put on the implementation of the NN module for training neural network acoustic models. We describe code design, configuration, and features of the NN module. The […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Reduced Vlasov-Maxwell simulations

Genetically Improved CUDA kernels for StereoCamera

Efficient Preconditioned Conjugate Gradient Parallelization on GPU

MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence

OpenCL-Accelerated Simplified General Perturbations 4 Algorithm

GPU-EvR: Run-time Event Based Real-time Scheduling Framework on GPGPU Platform

Massively parallel read mapping on GPUs with PEANUT

GPU Accelerated Discontinuous Galerkin Methods for Shallow Water Equations

A GPU Accelerated Aggregation Algebraic Multigrid Method

XeonPhi Meets Astrophysical Fluid Dynamics

Real-time video denoising for 2D ultrasound streaming video on GPUs

RASR/NN: The RWTH Neural Network Toolkit for Speech Recognition

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)