high performance computing on graphics processing units: hgpu.org

Posts

Mar, 16

Exploring power efficiency and optimizations targeting heterogeneous applications

Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains, due to the potential for approaching or exceeding the performance of a large cluster of CPUs for many parallel applications. The availability of programming standards such as OpenCL makes the use of GPUs even more […]

OpenCL

Mar, 15

Prius: A Runtime for Hybrid Computing

Prius is a framework for seamless execution of OpenCL programs across integrated, heterogeneous systems. Applications interfacing with Prius need not be aware of the characteristics of the hardware; instead the framework will automatically map kernel executions to suitable processors at run-time. The modular nature of the framework allows easy evaluation of new mapping strategies.

OpenCL

Mar, 12

High Performance GPU Accelerated Local Optimization in TSP

This paper presents a high performance GPU accelerated implementation of 2-opt local search algorithm for the Traveling Salesman Problem (TSP). GPU usage significantly decreases the execution time needed for tour optimization, however it also requires a complicated and well tuned implementation. With the problem size growing, the time spent on local optimization comparing the graph […]

Mar, 3

Low-Energy Application Parallelism 2013, LEAP 2013

LEAP 2013 is the place to learn about and share the latest advances in the use of high-performance parallel computing technology on low-power mobile CPU, GPU, FPGA and embedded processors. Two days of world-class education and networking will give developers, researchers, engineers and technology managers the vital knowledge they need to understand, assess and exploit […]

Mar, 2

Full Covariance Gaussian Mixture Models Evaluation on GPU

Gaussian mixture models (GMMs) are often used in various data processing and classification tasks to model a continuous probability density in a multi-dimensional space. In cases, where the dimension of the feature space is relatively high (e.g. in the automatic speech recognition (ASR)), GMM with a higher number of Gaussians with diagonal covariances (DC) instead […]

CUDA

•

OpenCL

Feb, 9

Effectiveness of program transformations and compilers for directive-based GPU programming models

Accelerator devices like the General Purpose Graphics Computing Units (GPGPUs) play an important role in enhancing the performance of many contemporary scientific applications. However, programming GPUs using languages like C for CUDA or OpenCL requires relatively high investment of time and the resulting programs are often fine-tuned to perform well only on a particular device. […]

CUDA

Feb, 6

Implementation of Fast Artificial Neural Network for Pattern Classification on Heterogeneous System

Neural networks have been part of an attempt to emulate the learning curve of the human nervous system. Graphics Processing Units (GPUs) that come with a Graphics card have hundreds of processing cores, and have highly parallel architecture. Because of the highly parallel architecture of GPUs, it suits very well for parallel architecture such as […]

OpenCL

Feb, 2

Heterogeneous GPU and CPU acceleration of a finite volume compressible flow solver for multiblock structured grids

The main objective of this project is to investigate the applications of heterogeneous acceleration to finite volume compressible flow solver for multiblock structured grids. Provided as Fortran source code, the ROTORMBMGS computational fluid dynamics program currently uses domain decomposition and message passing to distribute computation across multiple computers. Winning awards for scaling performance, there is […]

OpenCL

Feb, 2

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes; scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports […]

CUDA

Jan, 31

OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in improving performance. A major cause of this […]

CUDA

Jan, 31

Particle method on GPU

In this article we present a graphics processing unit (GPU) implementation of a particle method for transport equations. More precisely the numerical method under consideration is a remeshed particle method. Not only remeshing particles makes simulations more accurate in flows with strong strain, but it leads to algorithms more regular in term of data structures. […]

OpenCL

Jan, 26

Selection algorithm of graphic accelerators in heterogeneous cluster for optimization computing

The paper highlights the question of the optimal GPU computers selection for kernels in OpenCL when they are starting on heterogeneous clusters where different types of GPU are used. The authors propose optimal GPU selection algorithm that helps to get the best efficiency while program execution using GPU.

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Exploring power efficiency and optimizations targeting heterogeneous applications

Prius: A Runtime for Hybrid Computing

High Performance GPU Accelerated Local Optimization in TSP

Low-Energy Application Parallelism 2013, LEAP 2013

Full Covariance Gaussian Mixture Models Evaluation on GPU

Effectiveness of program transformations and compilers for directive-based GPU programming models

Implementation of Fast Artificial Neural Network for Pattern Classification on Heterogeneous System

Heterogeneous GPU and CPU acceleration of a finite volume compressible flow solver for multiblock structured grids

XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

Particle method on GPU

Selection algorithm of graphic accelerators in heterogeneous cluster for optimization computing

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)