high performance computing on graphics processing units: hgpu.org

Posts

May, 20

CLgrep: A Parallel String Matching Tool

In this study, we widely investigate the problem of string matching in the context of Heterogeneous Parallel Computing. A overview of string matching is made, in which the different forms of string matching problem are distinguished, and the classifications of string matching algorithm are discussed. As an alternative to grep for computational intensive string matching […]

OpenCL

May, 20

Parallel 5 point SOR for solving the Convection Diffusion equation using graphics processing units

In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use […]

CUDA

May, 20

Collision detection on the GPU

Modern GPUs are powerful parallel computing devices. In this report, a quick look at the GPU architecture and programming is provided. Collision detection algorithms are briefly surveyed to provide a good overall picture of the field before examining two GPU based collision detection methods in more detail. The fist method is a parallel implementation of […]

OpenCL

May, 20

CUDA Accelerated Robot Localization and Mapping

We present a method to accelerate robot localization and mapping by using CUDA (Compute Unified Device Architecture), the general purpose parallel computing platform on NVIDIA GPUs. In robotics, the particle filter-based SLAM (Simultaneous Localization and Mapping) algorithm has many applications, but is computationally intensive. Prior work has used CUDA to accelerate various robot applications, but […]

CUDA

May, 20

Solving Linear Equations with Conjugate Gradient Method on OpenCL Platforms

The parallelism in GPUs offers extremely good performance on a lot of high-performance computing applications. Linear algebra is one of the areas which can benefit from GPU potential. Conjugate Gradient (CG) benchmark is a significant computation in computing applications. It uses conjugate gradient method that offers numerical solutions on specific systems of linear equations. The […]

OpenCL

May, 19

Generating 3D Topologies with Multiple Constraints on the GPU

The objective of this paper is to demonstrate a topology optimization method that can handle multiple constraints. The method relies on the concept of topological sensitivity that captures the first order change in any quantity of interest to a topological change. Specifically, in this paper, the topological sensitivity field for each of constraints is first […]

CUDA

May, 19

Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement

State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data (SIMD) style execution to achieve both high computational throughput and energy efficiency. As previous works have shown, there exists significant computational redundancy in SIMD execution, where different execution lanes operate on the same operand values. Such value locality is referred to as uniform vectors. In this […]

May, 19

Implicit Adaptive Volume Ray Casting

Ray Casting is an important visual application, used to visualize 3D datasets, such as CT data used in medical imaging. High quality image generation algorithms, known as ray casting, cast rays through the volume, performing compositing of each voxel into a corresponding pixel, based on voxel opacity and color. Since all rays perform the computations […]

CUDA

May, 19

Local Volatility FX Basket Option on CPU and GPU

We present high performance implementations on a CPU and an NVIDIA GPU of a Monte Carlo pricer for a simple FX basket option driven by a multi-factor local volatility model. Basket options such as these are typically considered too complicated to tackle analytically in a market-consistent manner, and are too high dimensional for PDE methods. […]

CUDA

May, 19

An implementation of level set based topology optimization using GPU

This work presents the implementation of a topology optimization approach based on level set method in massively parallel computer architectures, in particular on a Graphics Processing Unit (GPU). Such architectures are becoming so popular during last years for complex and tedious scientific computation. They are composed of dozens, hundreds, or even thousands of cores specially […]

CUDA

May, 19

Parallel Selectivity Estimation for Optimizing Multidimensional Spatial Join Processing on GPUs

Managing large-scale data is typically memory intensive. The current generation of GPUs has much lower memory capacity than CPUs which is often a limiting factor in processing large data. It is desirable to reduce memory footprint in spatially joining large-scale datasets through query optimization. In this study, we present a technique of selectivity estimation for […]

CUDA

May, 19

Parallel Zonal Summations of Large-Scale Species Occurrence Data on Hybrid CPU-GPU Systems

Analyzing how species are distributed on the Earth has been one of the fundamental questions in biogeography and ecology for a long time. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer amounts of […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

CLgrep: A Parallel String Matching Tool

Parallel 5 point SOR for solving the Convection Diffusion equation using graphics processing units

Collision detection on the GPU

CUDA Accelerated Robot Localization and Mapping

Solving Linear Equations with Conjugate Gradient Method on OpenCL Platforms

Generating 3D Topologies with Multiple Constraints on the GPU

Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement

Implicit Adaptive Volume Ray Casting

Local Volatility FX Basket Option on CPU and GPU

An implementation of level set based topology optimization using GPU

Parallel Selectivity Estimation for Optimizing Multidimensional Spatial Join Processing on GPUs

Parallel Zonal Summations of Large-Scale Species Occurrence Data on Hybrid CPU-GPU Systems

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)