high performance computing on graphics processing units: hgpu.org

Posts

Oct, 4

Berkeley Dwarfs on CUDA

Graphics processing units (GPUs) greatly improved their performance over the last ten years. The first graphics cards have been developed in the late 90’s and were targeted for the mass market. These first cards were special purpose hardware, designed to accelerate graphic processing required in computer games. As the interest in computer games continued, GPU […]

CUDA

•

OpenCL

Oct, 4

Comparing Parallel Simulation of Social Agents using Cilk and OpenCL

Recent advances in wireless/mobile communication and body worn sensors, together with ambient intelligence and seamless integrated pervasive technology have paved the way for applications operating based on social signals, i. e., sensing and processing of group behavior, interpersonal relationships, or emotions. Thinking in large, it should be apparent that modeling social systems allowing to study […]

OpenCL

Oct, 4

Optimization of the Gaussian Mixture Model Evaluation on GPU

In this paper we present a highly optimized implementation of Gaussian mixture acoustic model evaluation algorithm. Evaluation of these likelihoods is one of the most computationally intensive parts of automatics speech recognizers but it can be well-parallelized and offloaded to GPU devices. Our approach offers significant speed-up compared to the recently published approaches, since it […]

CUDA

•

OpenCL

Oct, 4

Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance raytracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We […]

OpenCL

Oct, 3

Tranformation of CPU-based Applications To Leverage on Graphics Processors using CUDA

Scientific computation requires a great amount of computing power especially in floating-point operation but a high-end multi-cores processor is currently limited in terms of floating point operation performance and parallelization. Recent technological advancement has made parallel computing technically and financially feasible using Compute Unified Device Architecture (CUDA) developed by NVIDIA. This research focuses on measuring […]

CUDA

Oct, 3

Parallel Game Tree Search Using GPU

Parallel performance of graphics cards in desktop computers generally outreaches performance of conventional processors. The purpose of this paper is to identify possibilities of tasks parallelization when searching and evaluating game trees and to propose algorithms that would perform better on SIMD processors of graphics cards than on regular desktop processors. On proposed algorithms’ basis […]

CUDA

Oct, 3

Implementation of the optimization algorithms on GPGPU architecture and multi-cores

This bibliography study mainly synthesize the key ideas of the parallel architectures, neural network models, and discuss the implementation algorithm design methods that will be used on the GPGPU and multicores to realize the optimizations. Since the neural network computational models are regarded as valuable tools to solve many scientific and practical problems, and it […]

CUDA

•

OpenCL

Oct, 3

GPU-Accelerated DNA Distance Matrix Computation

Distance matrix calculation used in phylogeny analysis is computational intensive. The growing sequences data sets necessitate fast computation method. This paper accelerate Felsenstein’s DNADIST program by using OpenCL to exploit the great computation capability of graphic card. The GPUaccelerated DNADIST program achieves more than 12-fold speedup over the serial CPU program on a personal workstation […]

OpenCL

Oct, 3

Parallel SAT-Solving with OpenCL

In the last few decades there have been substantial improvements in approaches for solving the Boolean satisfiability problem. Many of these improvements consisted in elaborating on existing algorithms. On the side of the complete solvers this led to more efficient branching heuristics and the use of watched literals for unit propagation; incomplete solvers on the […]

OpenCL

Oct, 3

Heterogeneous Computing with OpenCL

Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous […]

OpenCL

Oct, 3

An OpenCL Fast Fourier Transformation

This paper describes an implementation strategy in preparation for an implementation of an OpenCL FFT. The two most essential factors (memory bandwidth and locality) that are crucial to obtain high performance on a GPU for an FFT implementation are highlighted. Theoretical upper bounds for performance in terms of the locality factor are derived. An implementation […]

OpenCL

Oct, 3

Realtime Computation of a VST Audio Effect Plugin on the Graphics Processor

A plugin system for GPGPU real time audio effect calculation on the graphics processing unit of the computer system is presented. The prototype application is the rendering of mono audio material with head-related transfer functions (HRTFs) to create the impression of a sound source located in a certain direction relative to the listener’s head. The […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Berkeley Dwarfs on CUDA

Comparing Parallel Simulation of Social Agents using Cilk and OpenCL

Optimization of the Gaussian Mixture Model Evaluation on GPU

Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

Tranformation of CPU-based Applications To Leverage on Graphics Processors using CUDA

Parallel Game Tree Search Using GPU

Implementation of the optimization algorithms on GPGPU architecture and multi-cores

GPU-Accelerated DNA Distance Matrix Computation

Parallel SAT-Solving with OpenCL

Heterogeneous Computing with OpenCL

An OpenCL Fast Fourier Transformation

Realtime Computation of a VST Audio Effect Plugin on the Graphics Processor

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)