high performance computing on graphics processing units: hgpu.org

Posts

Sep, 26

PGEM: Preemptive GPGPU Execution Model for Runtime Engines

General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the […]

CUDA

Sep, 26

Manycore high-performance computing in bioinformatics

Mining the increasing amount of genomic data requires having very efficient tools. Increasing the efficiency can be obtained with better algorithms, but one could also take advantage of the hardware itself to reduce the application runtimes. Since a few years, issues with heat dissipation prevent the processors from having higher frequencies. One of the answers […]

CUDA

•

OpenCL

Sep, 26

Generating GPU Code from a High-level Representation for Image Processing Kernels

We present a framework for representing image processing kernels based on decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access pattern of a kernel. The framework performs source-to-source translation of kernels expressed in highlevel framework-specific C++ classes into low-level CUDA or OpenCL code with effective device-dependent optimizations such as […]

CUDA

•

OpenCL

Sep, 26

LLVM to PTX Backend

The low-level virtual machine (LLVM) compiler infrastructure is a mature and stable framework to implement optimization and compiler passes. H. Rhodin presented an LLVM backend to generate Parallel Thread Execution (PTX) instructions from LLVM bitcode. PTX is used as intermediate representation for parallel programming. This paper discusses Rhodin’s PTX generator. Due to the similarity between […]

CUDA

Sep, 26

A Uniform Platform to Support Multigenerational GPUs for High Performance Stream-based Computing

GPU-based computing has become one of the popular high performance computing fields. The field is called GPGPU. This paper is focused on design and implementation of a uniform GPGPU application that is optimized for both the legacy and the recent GPU architectures. As a typical example of such the GPGPU application, this paper will discuss […]

CUDA

•

OpenCL

•

OpenGL

Sep, 26

High-level GPU computing with jacket for MATLAB and C/C++

We describe a software platform for the rapid development of general purpose GPU (GPGPU) computing applications within the MATLAB computing environment, C, and C++: Jacket. Jacket provides thousands of GPU-tuned function syntaxes within MATLAB, C, and C++, including linear algebra, convolutions, reductions, and FFTs as well as signal, image, statistics, and graphics libraries. Additionally, Jacket […]

CUDA

•

OpenGL

Sep, 26

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide a […]

CUDA

•

OpenCL

Sep, 26

Identifying scalar behavior in CUDA kernels

We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPMD) programming model. It identifies statically several kinds of regular patterns that can occur between adjacent threads, including common computations, memory accesses at consecutive locations or at the same location and uniform control flow. This knowledge can be exploited by […]

CUDA

Sep, 26

Putting Automatic Polyhedral Compilation for GPGPU to Work

Automatic parallelization is becoming more important as parallelism becomes ubiquitous. The first step for achieving automation is to develop a theoretical foundation, for example, the polyhedron model. The second step is to implement the algorithms studied in the theoretical framework and getting them to work in a compiler that can be used to parallelize real […]

CUDA

Sep, 26

Running unstructured grid-based CFD solvers on modern graphics hardware

Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three-dimensional Euler equations for inviscid, compressible flow are considered. Effective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per-block shared […]

CUDA

Sep, 25

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

We present a study of three important kernels that occur frequently in iterative statistical applications: K-Means, Multi-Dimensional Scaling (MDS), and PageRank. We implemented each kernel using OpenCL and evaluated their performance on an NVIDIA Tesla GPGPU card. By examining the underlying algorithms and empirically measuring the performance of various components of the kernel we explored […]

OpenCL

Sep, 25

Exploiting Heterogeneous Computing Platforms By Cataloging Best Solutions For Resource Intensive Seismic Applications

Large heterogeneous data centers of today lack methods to appraise the best fitting solutions regarding, among others, hardware acquisition cost, development time, and performance. Especially resource intensive applications benefit from increased data center utilization to leverage heterogeneous resources and accelerators. In this paper, we implement various methods to accelerate a seismic modeling application, which is […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

PGEM: Preemptive GPGPU Execution Model for Runtime Engines

Manycore high-performance computing in bioinformatics

Generating GPU Code from a High-level Representation for Image Processing Kernels

LLVM to PTX Backend

A Uniform Platform to Support Multigenerational GPUs for High Performance Stream-based Computing

High-level GPU computing with jacket for MATLAB and C/C++

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

Identifying scalar behavior in CUDA kernels

Putting Automatic Polyhedral Compilation for GPGPU to Work

Running unstructured grid-based CFD solvers on modern graphics hardware

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

Exploiting Heterogeneous Computing Platforms By Cataloging Best Solutions For Resource Intensive Seismic Applications

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)