high performance computing on graphics processing units: hgpu.org

Posts

Nov, 12

GPU computing and Many Integrated Core Computing (PDP), 2018

TOPICS: * GPU computing, multi GPU processing, hybrid computing * Programming models, programming frameworks, CUDA, OpenCL, communication libraries * Mechanisms for mapping codes * Task allocation * Fault tolerance * Performance analysis * Many Integrated Core architecture, MIC * Intel coprocessor, Xeon Phi * Vectorization * Applications: image processing, signal processing, linear algebra, numerical simulation, […]

Nov, 12

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

We present a highly scalable Monte Carlo (MC) 3D photon transport simulation platform designed for heterogeneous computing systems. By developing a massively parallel MC algorithm using the OpenCL framework, this research extends our existing GPU-accelerated MC technique to a highly-scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel […]

OpenCL

Nov, 12

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines […]

OpenCL

Nov, 7

Radeon PRO Solid State Graphics (SSG) API User Manual

The Radeon Pro SSG software library enables peer-to-peer (P2P) data transfers between GPU and Radeon on board SSD devices. It allows a methodology to read OS file data from SSDs to OpenCL, OpenGL and DirectX buffers with very low-latency P2P communication. The development kit version of this library supports only the Microsoft Windows 10 operating […]

OpenCL

•

OpenGL

Nov, 5

Data Coherence Analysis and Optimization for Heterogeneous Computing

Although heterogeneous computing has enabled impressive program speed-ups, knowledge about the architecture of the target device is still critical to reap full hardware benefits. Programming such architectures is complex and is usually done by means of specialized languages (e.g. CUDA, OpenCL). The cost of moving and keeping host/device data coherent may easily eliminate any performance […]

OpenCL

Oct, 31

Automatic Scan Parallelization in OpenMP

Prefix Scan (or simply scan) is an operator that computes all the partial sums of a vector. A scan operation results in a vector where each element is the sum of the preceding elements in the original vector up to the corresponding position. Scan is a key operation in many relevant problems like sorting, lexical […]

CUDA

•

OpenCL

Oct, 29

A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing

This PhD project is motivated by the need to develop and achieve better and energy efficient computing through the use of parallelism and heterogeneous systems. Our contribution consists of both theoretical aspects, as well as in-depth and comprehensive empirical studies that aim to provide more insight into parallel and heterogeneous computing. Our first problem is […]

OpenCL

Oct, 24

A Fast and Generic GPU-Based Parallel Reduction Implementation

Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A parallel reduction, in turn, is the reduction operation concurrently performed when multiple execution units are available. The current […]

CUDA

•

OpenCL

Oct, 24

Parallel Computing for the Inverse of SPD matrix

In this paper, we propose a High performance Parallel Computing method for the Inverse of a symmetric positive definite (SPD) matrix. Brought in the reuse of the inverse of diagonal sub blocks technique and Combined with the newest OpenCL parallel computing framework, this methods can improve computing the inverse of SPD matrix effectively. Computing the […]

OpenCL

Oct, 21

How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API

Is transferring computation intensive calculations to external compute-units the next trend? This master’s thesis researches if it is worth the effort to transfer a matrix multiplication from an Android phone to a System-on-Chip (SoC), using Bluetooth or WebSocket as communication protocols. The SoC solution used in this work is an Intel Altera Cyclone V based […]

OpenCL

Oct, 3

FPGA implementation of a Convolutional Neural Network for "Wake up word" detection

The popularity of machine learning has increased dramatically in the last years and the possible applications varies from web search, speech recognition, object detection, etc. A big part of this development is due to the use of Convolutional Neural Networks (CNNs), where high performance Graphics Processing Units (GPUs) has been the most popular device. This […]

OpenCL

Oct, 3

Computing Treewidth on the GPU

We present a parallel algorithm for computing the treewidth of a graph on a GPU. We implement this algorithm in OpenCL, and experimentally evaluate its performance. Our algorithm is based on an O*(2^n)-time algorithm that explores the elimination orderings of the graph using a Held-Karp like dynamic programming approach. We use Bloom filters to detect […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU computing and Many Integrated Core Computing (PDP), 2018

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

Radeon PRO Solid State Graphics (SSG) API User Manual

Data Coherence Analysis and Optimization for Heterogeneous Computing

Automatic Scan Parallelization in OpenMP

A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing

A Fast and Generic GPU-Based Parallel Reduction Implementation

Parallel Computing for the Inverse of SPD matrix

How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API

FPGA implementation of a Convolutional Neural Network for "Wake up word" detection

Computing Treewidth on the GPU

Recent source codes

CuPBoP-AMD: Extending CUDA to AMD Platforms

Adopter: Automated Deep Learning Optimization via DSL-based Source Code Transformation

ROCm's implementation of Gromacs

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

Most viewed papers (last 30 days)