high performance computing on graphics processing units: hgpu.org

Posts

Mar, 30

Real-time multi-view deconvolution

In light-sheet microscopy, overall image content and resolution are improved by acquiring and fusing multiple views of the sample from different directions. State-of-the-art multi-view (MV) deconvolution employs the point spread functions (PSF) of the different views to simultaneously fuse and deconvolve the images in 3D, but processing takes a multiple of the acquisition time and […]

CUDA

Mar, 28

Loo.py: From Fortran to performance via transformation and substitution rules

A large amount of numerically-oriented code is written and is being written in legacy languages. Much of this code could, in principle, make good use of data-parallel throughput-oriented computer architectures. Loo.py, a transformation-based programming system targeted at GPUs and general data-parallel architectures, provides a mechanism for user-controlled transformation of array programs. This transformation capability is […]

OpenCL

Mar, 28

Parallel Unsteady Flow Line Integral Convolution for High-Performance Dense Visualization

This paper presents an accurate parallel implementation of unsteady flow line integral convolution (UFLIC) for high-performance visualization of large time-varying flows. Our approach differs from previous implementations by using a novel value scattering+gathering mechanism to parallelize UFLIC and designing a pathline reuse strategy to reduce the computational cost of pathline integration. By exploiting the massive […]

CUDA

Mar, 28

Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space

When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach […]

CUDA

Mar, 28

Shortest-Path Queries in Planar Graphs on GPU-Accelerated Architectures

We develop an efficient parallel algorithm for answering shortest-path queries in planar graphs and implement it on a multi-node CPU/GPU clusters. The algorithm uses a divide-and-conquer approach for decomposing the input graph into small and roughly equal subgraphs and constructs a distributed data structure containing shortest distances within each of those subgraphs and between their […]

Mar, 28

PErasure: a Parallel Cauchy Reed-Solomon Coding Library for GPUs

In recent years, erasure coding has been adopted by large-scale cloud storage systems to replace data replication. With the increase of disk I/O throughput and network bandwidth, the speed of erasure coding becomes one of the key system bottlenecks. In this paper, we propose to offload the task of erasure coding to Graphics Processing Units […]

CUDA

Mar, 25

Pseudorandom Numbers Generation for Monte Carlo Simulations on GPUs: OpenCL Approach

General principles of pseudorandom numbers production for Monte Carlo simulations on GPUs are discussed by creating an OpenCL open-source library of pseudorandom number generators PRNGCL. The library contains implementation of a number of the most popular uniform generators. The most popular pseudorandom number generators for Monte Carlo simulations and libraries for GPUs are reviewed. Some […]

OpenCL

Mar, 25

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is […]

CUDA

Mar, 25

Analysis of illumination conditions at the lunar south pole using parallel computing techniques

In this Master Thesis an analysis of illumination conditions at the lunar south pole using parallel computing techniques is presented. Due to the small inclination (1.54o) of the lunar rotational axis with respect to the ecliptic plane and the topography of the lunar south pole, which allows long illumination periods, the study of illumination conditions […]

OpenCL

Mar, 25

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when developing and running compute-heavy code on a CPU. Both ease of programming and […]

OpenCL

Mar, 25

Data driven scheduling approach for the multi-node multi-GPU Cholesky decomposition

Recently large scale scientific computation on heterogeneous supercomputers equipped with accelerators is receiving attraction. However, traditional static job execution methods and memory management methods are insufficient in order to harness heterogeneous computing resources including memory efficiently, since they introduce larger data movement costs and lower resource usage. This paper takes the Cholesky decomposition computation, which […]

CUDA

Mar, 23

Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

The analysis of next-generation sequencing (NGS) data is a major topic in bioinformatics: short reads obtained from DNA, the molecule encoding the genome of living organisms, are processed to provide insight into biological or medical questions. This thesis provides novel solutions to major topics within the analysis of NGS data, focusing on parallelization, scalability and […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Real-time multi-view deconvolution

Loo.py: From Fortran to performance via transformation and substitution rules

Parallel Unsteady Flow Line Integral Convolution for High-Performance Dense Visualization

Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space

Shortest-Path Queries in Planar Graphs on GPU-Accelerated Architectures

PErasure: a Parallel Cauchy Reed-Solomon Coding Library for GPUs

Pseudorandom Numbers Generation for Monte Carlo Simulations on GPUs: OpenCL Approach

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

Analysis of illumination conditions at the lunar south pole using parallel computing techniques

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

Data driven scheduling approach for the multi-node multi-GPU Cholesky decomposition

Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)