high performance computing on graphics processing units: hgpu.org

Posts

Sep, 24

Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions

Current processor architectures are diverse and heterogeneous. Examples include multicore chips, GPUs and the Cell Broadband Engine (CBE). The recent Open Compute Language (OpenCL) standard aims at efficiency and portability. This paper explores its efficiency when implemented on the CBE, without using CBE-specific features such as explicit asynchronous memory transfers. We based our experiments on […]

OpenCL

Sep, 24

Automatic Translation of CUDA to OpenCL and Comparison of Performance Optimizations on GPUs

As an open, royalty-free framework for writing programs that execute across heterogeneous platforms, OpenCL gives programmers access to a variety of data parallel processors including CPUs, GPUs, the Cell and DSPs. All OpenCL-compliant implementations support a core specification, thus ensuring robust functional portability of any OpenCL program. This thesis presents the CUDAtoOpenCL source-to-source tool that […]

CUDA

•

OpenCL

Sep, 24

OpenCL: a viable solution for high-performance medical image reconstruction?

Reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. For interventional image reconstruction, hardware optimization is mandatory. Manufacturers of medical equipment use a variety of high-performance computing (HPC) platforms, like FPGAs, graphics cards, or multi-core CPUs. A problem of this diversity is that many different frameworks and (vendor-specific) programming languages […]

OpenCL

Sep, 24

Single Scattering of Aspherical Particles in DDA Calculations on GPUs Using OpenCL

The global distribution and climatology of ice clouds are among the main uncertainties in climate modelling and prediction. In order to retrieve ice cloud properties from remote sensing measurements, the scattering properties of all cloud ice particle types must be known. The Discrete Dipole Approximation (DDA) simulates scattering of radiation by arbitrarily shaped particles and […]

OpenCL

Sep, 24

Functional Signal Processing with Pure and Faust Using the LLVM Toolkit

Pure and Faust are two functional programming languages useful for programming computer music and other multimedia applications. Faust is a domain-specific language specifically designed for synchronous signal processing, while Pure is a general-purpose language which aims to facilitate symbolic processing of complicated data structures in a variety of application areas. Pure is based on the […]

Sep, 24

Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in OpenCL Accelerators

OpenCL is an emerging open framework for parallel programming in heterogenous systems. OpenCL accelerators need to schedule the execution of submitted jobs with no (or only very imprecise) estimates of execution times, but respecting dependencies among them, which are given in the form of directed acyclic graph. This problem is known as stochastic taskgraph scheduling, […]

OpenCL

Sep, 24

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation […]

CUDA

•

OpenCL

Sep, 24

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Advanced engineering and scientific communities have used parallel programming to solve their large scale complex problems. Achieving high performance is the main advantage for this choice. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we […]

OpenCL

Sep, 23

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

The technology community is rapidly moving away from the age of computers and laptops, and is entering the emerging era of hand-held devices. With the rapid development of smart phones, tablets, and pads, there has been widespread adoption of Graphic Processing Units (GPUs) in the embedded space. The hand-held market is now seeing an ever […]

OpenCL

Sep, 23

Embedding OpenCL in C++ for Expressive GPU Programming

We present a high performance GPU programming language, based on OpenCL, that is embedded in C++. Our embedding provides shared data structures, typesafe kernel invocation, and the ability to more naturally interleave CPU and GPU functions, similar to CUDA but with the portability of OpenCL. For expressivity, our language provides an abstraction that releases control […]

OpenCL

•

OpenGL

Sep, 23

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Wide Single Instruction, Multiple Thread (SIMT)architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application kernel. Individual thread branching is supported by executing all control flow paths for threads in a thread group and only committing the results of threads on the current control path. While convergence […]

CUDA

Sep, 23

Accelerating reaction-diffusion simulations with general-purpose graphics processing units

SUMMARY: We present a massively parallel stochastic simulation algorithm (SSA) for reaction-diffusion systems implemented on Graphics Processing Units (GPUs). These are designated chips optimized to process a high number of floating point operations in parallel, rendering them well-suited for a range of scientific high-performance computations. Newer GPU generations provide a high-level programming interface which turns […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions

Automatic Translation of CUDA to OpenCL and Comparison of Performance Optimizations on GPUs

OpenCL: a viable solution for high-performance medical image reconstruction?

Single Scattering of Aspherical Particles in DDA Calculations on GPUs Using OpenCL

Functional Signal Processing with Pure and Faust Using the LLVM Toolkit

Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in OpenCL Accelerators

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

Embedding OpenCL in C++ for Expressive GPU Programming

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Accelerating reaction-diffusion simulations with general-purpose graphics processing units

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)