high performance computing on graphics processing units: hgpu.org

Posts

Apr, 6

Improving GPU Performance Prediction with Data Transfer Modeling

Accelerators such as graphics processors (GPUs) have become increasingly popular for high performance scientific computing. Often, much effort is invested in creating and optimizing GPU code without any guaranteed performance benefit. To reduce this risk, performance models can be used to project a kernel’s GPU performance potential before it is ported. However, raw GPU execution […]

CUDA

Apr, 6

Real-Time Object-Space Edge Detection using OpenCL

At its most basic, object-space edge detection iterates through all polygonal edges in each mesh to find those edges that satisfy one or more edge tests. Those that do are expanded and rendered, while the remainder are ignored. These 3D edges, and their resulting accuracy and customizability, set objectspace methods apart from all other categories […]

OpenCL

Apr, 6

Parallel Implementation of Dynamic Programming Algorithm Using Graphics Processing Unit

In this research implementation of a dynamic programming algorithm (Viterbi) has been done on graphics processing unit of NVidia using CUDA model. As graphical processing units are becoming important in supporting central processing units for the acceleration of complex floating point calculations. The complex computation goes on parallel in graphics processing unit as it contains […]

CUDA

Apr, 4

Adapting Particle Filter Algorithms to Many-Core Architectures

The particle filter is a Bayesian estimation technique based on Monte Carlo simulation. It is ideal for non-linear, nonGaussian dynamical systems with applications in many areas, such as computer vision, robotics, and econometrics. Practical use has so far been limited, because of steep computational requirements. In this study, we investigate how to design a particle […]

CUDA

•

OpenCL

Apr, 4

Deploying Graph Algorithms on GPUs: an Adaptive Solution

Thanks to their massive computational power and their SIMT computational model, Graphics Processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stencil computations, image processing and bioinformatics algorithms, among others). However, many established and emerging problems are based on irregular data structures, such as graphs. Examples can […]

CUDA

Apr, 4

Optimising Purely Functional GPU Programs

Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance. It this […]

CUDA

Apr, 4

Real-time Stereo Vision: Optimizing Semi-Global Matching

Semi-Global Matching (SGM) is arguably one of the most popular algorithms for real-time stereo vision. It is already employed in mass production vehicles today. Thinking of applications in intelligent vehicles (and fully autonomous vehicles in the long term), we aim at further improving SGM regarding its accuracy. In this study, we propose a straight-forward extension […]

CUDA

Apr, 4

C Language Extensions for Hybrid CPU/GPU Programming with StarPU

Modern platforms used for high-performance computing (HPC) include machines with both general-purpose CPUs, and "accelerators", often in the form of graphical processing units (GPUs). StarPU is a C library that addresses this problem by providing users with ways to define "tasks" to be executed on CPUs or GPUs, along with the dependencies among them, and […]

OpenCL

Apr, 3

GPU Accelerated Automated Feature Extraction from Satellite Images

The availability of large volumes of remote sensing data insists on higher degree of automation in feature extraction, making it a need of the hour. Fusing data from multiple sources, such as panchromatic, hyper spectral and LiDAR sensors, enhances the probability of identifying and extracting features such as buildings, vegetation or bodies of water by […]

CUDA

Apr, 3

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

The clock speed of current CPUs and RAM has stopped scaling with Moore’s Law. Yet the scale of applications in science and engineering continues to increase. In order to address this scaling of applications, newer NUMA architectures are emerging. These include parallel disks, hybrid CPU-GPU, and many-core CPUs. Existing CPU-based algorithms, as well as legacy […]

CUDA

Apr, 3

Astrophysical data mining with GPU. A case study: genetic classification of globular clusters

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from our CPU serial implementation, named GAME (Genetic Algorithm Model Experiment). It was successfully tested and validated on the detection of candidate Globular Clusters in deep, wide-field, single band HST images. The GPU version of […]

CUDA

Apr, 3

The Stencil Processing Unit: GPGPU Done Right

As computing moves to exascale, it will be dominated by energy-efficiency. We propose a new GPU-like accelerator called the Stencil Processing Unit (SPU), for implementing dense stencil computations in an energy-efficient manner. We address all the levels of the programming stack, from architecture, programming API, runtime system and compilation. First, a simple architectural innovation to […]

high performance computing on graphics processing units: hgpu.org

Posts

Improving GPU Performance Prediction with Data Transfer Modeling

Real-Time Object-Space Edge Detection using OpenCL

Parallel Implementation of Dynamic Programming Algorithm Using Graphics Processing Unit

Adapting Particle Filter Algorithms to Many-Core Architectures

Deploying Graph Algorithms on GPUs: an Adaptive Solution

Optimising Purely Functional GPU Programs

Real-time Stereo Vision: Optimizing Semi-Global Matching

C Language Extensions for Hybrid CPU/GPU Programming with StarPU

GPU Accelerated Automated Feature Extraction from Satellite Images

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

Astrophysical data mining with GPU. A case study: genetic classification of globular clusters

The Stencil Processing Unit: GPGPU Done Right

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)