high performance computing on graphics processing units: hgpu.org

Posts

Jun, 16

An in-depth performance analysis of irregular workloads on VLIW APU

Heterogeneous multi-core architectures have a higher performance/power ratio than traditional homogeneous architectures. Due to their heterogeneity, these architectures support diverse applications but developing parallel algorithms on these architectures can be difficult. In implementing algorithms for heterogeneous systems, proprietary languages are often required, limiting portability. Although general purpose graphics processing units (GPUs) have shown great promise […]

OpenCL

Jun, 8

Native Offload of Haskell Repa Programs to GPGPU

In light of recent hardware advances, General Purpose Graphics Processing Units (GPGPUs) are becoming increasingly commonplace, and demand novel programming models to account for their radically different architecture. For the most part, existing approaches to programming GPGPUs within a high-level programming language choose to embed a domain specific language (DSL) within a host metalanguage and […]

OpenCL

Jun, 5

Mapping parallel programs to heterogeneous multi-core systems

Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile to high-performance computing. They promise to deliver increased performance at lower energy cost than purely homogeneous, CPU-based systems. In recent years GPU-based heterogeneous systems have become increasingly popular. They combine a programmable GPU with a multi-core CPU. GPUs have become flexible enough to […]

OpenCL

Jun, 3

Visualization Tool for GPGPU Programming

The running times of some sequential programs could be greatly reduced by converting and running its parallelizable, time dominant code on a massively, parallel processor architecture. Example program application areas include: bioinformatics, molecular dynamics, video and image processing, signal and audio processing, medical imaging, and cryptography. A low cost, low power, parallel computing platform for […]

CUDA

•

OpenCL

Jun, 2

Loo.py: transformation-based code generation for GPUs and CPUs

Today’s highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets […]

OpenCL

May, 25

Engineering a static verification tool for GPU kernels

We report on practical experiences over the last 2.5 years related to the engineering of GPUVerify, a static verification tool for OpenCL and CUDA GPU kernels, plotting the progress of GPUVerify from a prototype to a fully functional and relatively efficient analysis tool. Our hope is that this experience report will serve the verification community […]

CUDA

•

OpenCL

May, 21

Vector Quantization: A Many-Core Approach

Many-Core computing is an actual growing concept that allows the true parallelization of computational tasks. In the particular case of this paper, the vector quantization algorithm was adapted to the many-core concept with the objective of compressing images encoded in the PGM format. For that, a given sequential implementation of the algorithm was optimized and […]

CUDA

•

OpenCL

May, 18

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

GPUs have largely entered HPC clusters, as shown by the top entries of the latest top500 issue. Exploiting such machines is however very challenging, not only because of combining two separate paradigms, MPI and CUDA or OpenCL, but also because nodes are heterogeneous and thus require careful load balancing within nodes themselves. The current paradigms […]

CUDA

•

OpenCL

May, 9

Applying Source Level Auto-Vectorization to Aparapi Java

Ever since chip manufacturers hit the power wall preventing them from increasing processor clock speed, there has been an increased push towards parallelism for performance improvements. This parallelism comes in the form of both data parallel single instruction multiple data (SIMD) instructions, as well as parallel compute cores in both central processing units (CPUs) and […]

CUDA

•

OpenCL

May, 7

Simulation of earthquake sloshing loads in a nuclear reactor

Modelling of sloshing flow inside a Lead-cooled Fast Nuclear Reactor during an earthquake is conducted, focusing on the evaluation of the loads caused by the fluid on the structure. AQUAgpusph, a free software OpenCL accelerated SPH code has been used. This tool is analysed, including the performance comparison with some available GPU accelerated SPH codes, […]

OpenCL

May, 6

Implementing an efficient method of check-pointing on CPU-GPU

In this paper, we describe the design, implementation, verification and analysis of providing fine-grained architectural support for efficient check-pointing and restart on a CPU-GPU heterogeneous system. We use Multi2sim, a simulator, capable of emulating a CPU-GPU system. The simulator is capable of emulating a 32 bit x86 CPU that launches OpenCl Kernels on the GPU […]

OpenCL

May, 6

Multireduce and Multiscan on Modern GPUs

With the introduction of platforms like CUDA and OpenCL, the superior computing power of modern GPUs compared to CPUs is used more and more often to accelerate general purpose computations. Data parallel primitives like reduce, scan or sort can be used as simple, deterministic building blocks for parallel algorithms, hiding the complexity of the underlying […]

CUDA