high performance computing on graphics processing units: hgpu.org

Posts

Dec, 21

Architecting an LTE Base Station with Graphics Processing Units

Due to the rapid growth of mobile communication, wireless base stations are becoming a significant consumer of computational resources. Historically, base stations have been built from ASICs, DSP processors, or FPGAs. This paper studies the feasibility of building wireless base stations from commercial graphics processing units (GPUs). GPUs are attractive because they are widely used […]

CUDA

Dec, 21

GPU Accelerated Graph SLAM and Occupancy Voxel Based ICP For Encoder-Free Mobile Robots

Learning a map of an unknown environment and localising a robot in it is a common problem in robotics, with solutions usually requiring an estimate of the robot’s motion. In scenarios such as Urban Search and Rescue, motion encoders can be highly inaccurate, and weight and battery requirements often limit computing power. We have developed […]

CUDA

Dec, 21

Parallel Compact Genetic Algorithm on CUDA-C Platform

This paper deals about the parallel implementation of the compact Genetic Algorithm on the Compute Unified Device Architecture (CUDA) platform of GPU. We elaborate implementation details on the parallel platform.

CUDA

Dec, 21

Videogame Graphics, BigData & Analytics

The purpose of this coffee shop read is to attempt to highlight the criticality of videogames as a component of the "Convergence" of some amazing technologies (in particular: Cloud, Gaming/MMOG, Gamification and BigData) that is clear to many inside the IT world. I am not a deep technical "guru" I am a businessman that seeks […]

CUDA

•

OpenCL

•

OpenGL

Dec, 21

Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms

The progress of high performance computing platforms is dramatic, and most of the simulations carried out on these platforms, result in improvements on one level, yet exposes shortcomings of the current CFD packages capabilities. Therefore, hardware-aware design and optimizations are crucial towards exploiting the modern computing resources. This thesis proposes optimizations aimed at acceleration numerical […]

CUDA

Dec, 20

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other […]

CUDA

•

OpenCL

Dec, 20

Pannotia: Understanding Irregular GPGPU Graph Applications

GPUs have become popular recently to accelerate general-purpose data-parallel applications. However, most existing work has focused on GPU-friendly applications with regular data structures and access patterns. While a few prior studies have shown that some irregular workloads can also achieve speedups on GPUs, this domain has not been investigated thoroughly. Graph applications are one such […]

OpenCL

Dec, 20

Adapting the GA Approach to Solve Traveling Salesman Problems on CUDA Architecture

The vehicle routing problem (VRP) is one of the most challenging combinatorial optimization problems, which has been studied for several decades. The number of solutions for VRP increases exponentially while the number of points, which must be visited increases. There are 3.0×10^64 different solutions for 50 visiting points in a direct solution, and it is […]

CUDA

Dec, 20

XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD

Considerable research has been conducted recently on near-data processing techniques as real-world tasks increasingly involve large-scale and high-dimensional data sets. The advent of solid-state drives (SSDs) has spurred further research because of their processing capability and high internal bandwidth. However, the data processing capability of conventional SSD systems have not been impressive. In particular, they […]

Dec, 20

Towards global composition of performance-aware components for GPU-based systems

An important program optimization especially for heterogeneous parallel systems is performance-aware implementation selection which is (static or dynamic) selection between multiple implementation variants for the same computation, depending on the current execution context (such as currently available resources or performance affecting parameter values). Doing it for multiple component calls inside a program while considering interferences […]

CUDA

Dec, 19

Optimizing GPU to GPU Communication on Cray XK7

When developing an application for Cray XK7 systems, optimization of compute kernels is only a small part of maximizing scaling and performance. Programmers must consider the effect of the GPU’s distinct address space and the PCIe bus on application scalability. Without such considerations applications rapidly become limited by transfers to and from the GPU and […]

CUDA

Dec, 19

Experiences Porting a Molecular Dynamics Code to GPUs on a Cray XK7

GPU computing has rapidly gained popularity as a way to achieve higher performance of many scientific applications. In this paper we report on the experience of porting a hybrid MPI+OpenMP molecular dynamics code to a GPU enabled CrayXK7 to make a hybrid MPI+GPU code. The target machine, Indiana University’s Big Red II, consists of a […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Architecting an LTE Base Station with Graphics Processing Units

GPU Accelerated Graph SLAM and Occupancy Voxel Based ICP For Encoder-Free Mobile Robots

Parallel Compact Genetic Algorithm on CUDA-C Platform

Videogame Graphics, BigData & Analytics

Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Pannotia: Understanding Irregular GPGPU Graph Applications

Adapting the GA Approach to Solve Traveling Salesman Problems on CUDA Architecture

XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD

Towards global composition of performance-aware components for GPU-based systems

Optimizing GPU to GPU Communication on Cray XK7

Experiences Porting a Molecular Dynamics Code to GPUs on a Cray XK7

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)