high performance computing on graphics processing units: hgpu.org

Posts

Jun, 21

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. Results: We present CUSHAW, a parallelized […]

CUDA

Jun, 21

FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method

The Lagrangian vortex method offers an alternative numerical approach for direct numerical simulation of turbulence. The fact that it uses the fast multipole method (FMM)—a hierarchical algorithm for N-body problems with highly scalable parallel implementations—as numerical engine makes it a potentially good candidate for exascale systems. However, there have been few validation studies of Lagrangian […]

CUDA

Jun, 21

High-precision molecular dynamics simulation of UO2-PuO2: Anion self-diffusion in UO2

Our series of articles is devoted to high-precision molecular dynamics simulation of mixed actinide-oxide (MOX) fuel in the approximation of rigid ions and pair interactions (RIPI) using high-performance graphics processors (GPU). In this article we study self-diffusion mechanisms of oxygen anions in uranium dioxide (UO2) with the ten recent and widely used sets of interatomic […]

CUDA

Jun, 20

GPU Accelerated Greedy Algorithms for Compressed Sensing

For appropriate matrix ensembles, greedy algorithms have proven to be an efficient means of solving the combinatorial optimization problem associated with compressed sensing. This paper describes an implementation for graphics processing units (GPU) of hard thresholding, iterative hard thresholding, normalized iterative hard thresholding, hard thresholding pursuit, and a two stage thresholding algorithm based on compressive […]

CUDA

Jun, 20

Towards a GPU-based Implementation of Interaction Nets

We present ingpu, a GPU-based evaluator for interaction nets that heavily utilizes their potential for parallel evaluation. We discuss advantages and challenges of the ongoing implementation of ingpu and compare its performance to existing interaction nets evaluators.

CUDA

Jun, 20

GPU Computing: Image Convolution

Convolution of two functions is an important mathematical operation that found heavy application in signal processing. In computer graphics and image processing we usually work with discrete functions (e.g. an image) and apply a discrete form of the convolution to remove high frequency noise, sharpen details, detect edges, or otherwise modulate the frequency domain of […]

OpenCL

Jun, 20

Parallel Implementation of the Wu-Manber Algorithm Using the OpenCL Framework

One of the most significant issues of the computational biology is the multiple pattern matching for locating nucleotides and amino acid sequence patterns into biological databases. Sequential implementations for these processes have become inadequate, due to an increasing demand for more computational power. Graphic cards offer a high parallelism computational power improving the performance of […]

OpenCL

Jun, 20

An Investigation into Concurrent Expectation Propagation

As statistical machine learning becomes more and more prevalent and models become more complicated and fit to larger amounts of data, approximate inference mechanisms become more and more crucial to their success. Expectation propagation (EP) is one such algorithm for inference in probabilistic graphical models. In this work, we introduce a robustified version of EP […]

OpenCL

Jun, 19

Two Algorithms for Sorting On Heterogeneous Clusters

In the past few years, performance improvements in CPUs and memory technologies have outpaced those of storage systems. When extrapolated to the exascale, this trend places strict limits on the amount of data that can be written to disk for full analysis, resulting in an increased reliance on characterizing in-memory data. Many of these characterizations […]

CUDA

Jun, 19

Parallel Rendering on Hybrid Multi-GPU Clusters

Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a […]

OpenGL

Jun, 19

Optimizing dataflow applications on heterogeneous environments

The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate […]

CUDA

Jun, 19

Efficient simulations of long wave propagation and runup using a LBM approach on GPGPU hardware

We present an efficient implementation of the Lattice Boltzmann method (LBM) for the numerical simulation of the propagation of long ocean waves (e.g., tsunamis), based on the Nonlinear Shallow Water (NSW) wave equation. The LBM solution of NSW equations is fully nonlinear and it is assumed that the surface elevation is single-valued (hence, waves do […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method

High-precision molecular dynamics simulation of UO2-PuO2: Anion self-diffusion in UO2

GPU Accelerated Greedy Algorithms for Compressed Sensing

Towards a GPU-based Implementation of Interaction Nets

GPU Computing: Image Convolution

Parallel Implementation of the Wu-Manber Algorithm Using the OpenCL Framework

An Investigation into Concurrent Expectation Propagation

Two Algorithms for Sorting On Heterogeneous Clusters

Parallel Rendering on Hybrid Multi-GPU Clusters

Optimizing dataflow applications on heterogeneous environments

Efficient simulations of long wave propagation and runup using a LBM approach on GPGPU hardware

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)