high performance computing on graphics processing units: hgpu.org

Posts

Apr, 4

Adapting Particle Filter Algorithms to Many-Core Architectures

The particle filter is a Bayesian estimation technique based on Monte Carlo simulation. It is ideal for non-linear, nonGaussian dynamical systems with applications in many areas, such as computer vision, robotics, and econometrics. Practical use has so far been limited, because of steep computational requirements. In this study, we investigate how to design a particle […]

CUDA

•

OpenCL

Apr, 4

Deploying Graph Algorithms on GPUs: an Adaptive Solution

Thanks to their massive computational power and their SIMT computational model, Graphics Processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stencil computations, image processing and bioinformatics algorithms, among others). However, many established and emerging problems are based on irregular data structures, such as graphs. Examples can […]

CUDA

Apr, 4

Optimising Purely Functional GPU Programs

Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance. It this […]

CUDA

Apr, 4

Real-time Stereo Vision: Optimizing Semi-Global Matching

Semi-Global Matching (SGM) is arguably one of the most popular algorithms for real-time stereo vision. It is already employed in mass production vehicles today. Thinking of applications in intelligent vehicles (and fully autonomous vehicles in the long term), we aim at further improving SGM regarding its accuracy. In this study, we propose a straight-forward extension […]

CUDA

Apr, 4

C Language Extensions for Hybrid CPU/GPU Programming with StarPU

Modern platforms used for high-performance computing (HPC) include machines with both general-purpose CPUs, and "accelerators", often in the form of graphical processing units (GPUs). StarPU is a C library that addresses this problem by providing users with ways to define "tasks" to be executed on CPUs or GPUs, along with the dependencies among them, and […]

OpenCL

Apr, 3

GPU Accelerated Automated Feature Extraction from Satellite Images

The availability of large volumes of remote sensing data insists on higher degree of automation in feature extraction, making it a need of the hour. Fusing data from multiple sources, such as panchromatic, hyper spectral and LiDAR sensors, enhances the probability of identifying and extracting features such as buildings, vegetation or bodies of water by […]

CUDA

Apr, 3

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

The clock speed of current CPUs and RAM has stopped scaling with Moore’s Law. Yet the scale of applications in science and engineering continues to increase. In order to address this scaling of applications, newer NUMA architectures are emerging. These include parallel disks, hybrid CPU-GPU, and many-core CPUs. Existing CPU-based algorithms, as well as legacy […]

CUDA

Apr, 3

Astrophysical data mining with GPU. A case study: genetic classification of globular clusters

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of […]

CUDA

Apr, 3

The Stencil Processing Unit: GPGPU Done Right

As computing moves to exascale, it will be dominated by energy-efficiency. We propose a new GPU-like accelerator called the Stencil Processing Unit (SPU), for implementing dense stencil computations in an energy-efficient manner. We address all the levels of the programming stack, from architecture, programming API, runtime system and compilation. First, a simple architectural innovation to […]

Apr, 3

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming

Despite the vast interest in accelerator-based systems, programming large multinode GPUs is still a complex task, particularly with respect to optimal data movement across the host-GPU PCIe connection and then across the network. In order to address such issues, GPU-integrated MPI solutions have been developed that integrate GPU data movement into existing MPI implementations. Currently […]

CUDA

•

OpenCL

Apr, 1

Solving RFIC Simulation Tasks Using GPU Computations

New generation of General Purpose Graphic Processing Unit (GPGPU) cards with their large computation power allow to approach difficult tasks from Radio Frequency Integrated Circuits (RFICs) modeling area. Using different electromagnetic modeling methods, the Finite Element Method (FEM) and the Finite Integration Technique (FIT), to model Radio Frequency Integrated Circuit (RFIC) devices, large linear equations […]

CUDA

Apr, 1

Parallelization of the Cuckoo Search using CUDA Architecture

Cuckoo Search is one of the recent swarm itelligence metaheuritics. It has been succesfuly applied to a number of optimization problems but is stil not very well researched. In this paper we present a parallelized version of the Cuckoo Search algorithm. The parallelization is implemented using CUDA architecture. The algorithm is significantly changed compared to […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Adapting Particle Filter Algorithms to Many-Core Architectures

Deploying Graph Algorithms on GPUs: an Adaptive Solution

Optimising Purely Functional GPU Programs

Real-time Stereo Vision: Optimizing Semi-Global Matching

C Language Extensions for Hybrid CPU/GPU Programming with StarPU

GPU Accelerated Automated Feature Extraction from Satellite Images

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

Astrophysical data mining with GPU. A case study: genetic classification of globular clusters

The Stencil Processing Unit: GPGPU Done Right

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming

Solving RFIC Simulation Tasks Using GPU Computations

Parallelization of the Cuckoo Search using CUDA Architecture

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)