high performance computing on graphics processing units: hgpu.org

Posts

Feb, 27

Implementation of Smith-Waterman algorithm in OpenCL for GPUs

In this paper we present an implementation of the Smith-Waterman algorithm. The implementation is done in OpenCL and targets high-end GPUs. This implementation is capable of computing similarity indexes between reference and query sequences. The implementation is designed for the sequence alignment paths calculation. In addition, it is capable of handling very long reference sequences […]

OpenCL

Feb, 24

Very Fast Non-Dominated Sorting

A new and very efficient parallel algorithm for the Fast Non-dominated Sorting of Pareto fronts is proposed. By decreasing its computational complexity, the application of the proposed method allows us to increase the speedup of the best up to now Fast and Elitist Multi-Objective Genetic Algorithm (NSGA-II) more than two orders of magnitude. Formal proofs […]

CUDA

Feb, 24

Raster2Mesh: Rasterization based CVT meshing

In this paper, we propose to extend high quality Centroidal Voronoi Tessellation (CVT) remeshing techniques to the case of surfaces which are not defined by triangle meshes, such as implicit surfaces. Our key observation is that rasterization routines are usually available to visualize these alternative representations, most often as OpenGL shaders efficiently producing surface samples […]

OpenGL

Feb, 24

A Vision for GPU-accelerated Parallel Computation on Geo-Spatial Datasets

We summarize the need and present our vision for accelerating geo-spatial computations and analytics using a combination of shared and distributed memory parallel platforms, with general-purpose Graphics Processing Units (GPUs) with 100s to 1000s of processing cores in a single chip forming a key architecture to parallelize over. A GPU can yield one-to-two orders of […]

CUDA

Feb, 24

A Virtual Machine Model for Accelerating Relational Database Joins using a General Purpose GPU

We demonstrate a speedup for database joins using a general purpose graphics processing unit (GPGPU). The technique is novel in that it operates on an SQL virtual machine model developed using CUDA. The implementation compiles an SQL statement to instructions of the virtual machine that are then executed in parallel on the GPU. We use […]

CUDA

Feb, 24

Accelerating Lagrangian Particle Dispersion in the Atmosphere with OpenCL across Multiple Platforms

FLEXPART is a popular simulator that models the transport and diffusion of air pollutants, based on the Lagrangian approach. It is capable of regional and global simulation and supports both forward and backward runs. A complex model like this contains many calculations suitable for parallelisation. Recently, a GPU-accelerated version of the simulator (FLEXCPP) has been […]

CUDA

•

OpenCL

Feb, 23

High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster

High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be […]

CUDA

Feb, 23

Document Image Binarization Using Image Segmentation Algorithm in Parallel Environment

The Segmentation of text from poorly degraded document images is a very hard due to the high intravariation between the document background and the foreground text of different document images. The algorithms used for Image processing take more time for execution on a single core processor. Graphics Processing Unit (GPU) is becoming most popular due […]

CUDA

Feb, 23

Characterising Bipartite Graph Matching Algorithms on GPUs

Two well-known bipartite graph matching algorithms, the Gale-Shapley algorithm and the Hungarian (Kuhn-Munkres) algorithm, has been ported to run on General-Purpose Graphics Processing Units (GPGPU) using kernels written with the CUDA programming model. This was done with the goal of characterising and assessing the performance and behaviour of these matching algorithms on the GPU, and […]

CUDA

Feb, 23

Investigation of the OpenCL SYCL Programming Model

OpenCL SYCL is a new heterogeneous and parallel programming framework created by the Khronos Group that tries to bring OpenCL programming into C++. In particular, it enables C++ developers to create OpenCL kernels, using all the popular C++ features, such as classes, inheritance and templates. What is more, it dramatically reduces programming effort and complexity, […]

OpenCL

Feb, 23

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Hyperbolic conservation laws are important mathematical models for describing many phenomena in physics or engineering. The Finite Volume (FV) method and the Discontinuous Galerkin (DG) methods are two popular methods for solving conservation laws on computers. Those two methods are good candidates for parallel computing: a) they require a large amount of uniform and simple […]

OpenCL

Feb, 22

Stochastic Gradient Descent on GPUs

Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scale-free graphs is challenging. This work examines several synchronization strategies for SGD, ranging from simple locking to conflict-free scheduling. We observe that static […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Implementation of Smith-Waterman algorithm in OpenCL for GPUs

Very Fast Non-Dominated Sorting

Raster2Mesh: Rasterization based CVT meshing

A Vision for GPU-accelerated Parallel Computation on Geo-Spatial Datasets

A Virtual Machine Model for Accelerating Relational Database Joins using a General Purpose GPU

Accelerating Lagrangian Particle Dispersion in the Atmosphere with OpenCL across Multiple Platforms

High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster

Document Image Binarization Using Image Segmentation Algorithm in Parallel Environment

Characterising Bipartite Graph Matching Algorithms on GPUs

Investigation of the OpenCL SYCL Programming Model

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Stochastic Gradient Descent on GPUs

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)