high performance computing on graphics processing units: hgpu.org

Posts

Aug, 1

GPU-Accelerated Non-negative Matrix Factorization for Text Mining

An implementation of the non-negative matrix factorization algorithm for the purpose of text mining on graphics processing units is presented. Performance gains of more than one order of magnitude are obtained.

OpenCL

Jul, 31

accULL: An User-directed Approach to Heterogeneous Programming

The world of HPC is undergoing rapid changes and computer architectures capable to achieve high performance have broadened. The irruption in the scene of computational accelerators, like GPUs, is increasing performance while maintaining low cost per GFLOP, thus expanding the popularity of HPC. However, it is still difficult to exploit the new complex processor hierarchies. […]

CUDA

•

OpenCL

Jul, 31

Parallel programming on GPU using Intel Array Building Blocks

The goal of this project is to demonstrate Parallel Programming on a GPU using the latest Intel technology called Intel Array Building Blocks (Intel ArBB). The main aim is to describe the programming model of Intel ArBB and show effectiveness of the new technology, Intel ArBB on a GPU environment using examples. Parallel Programming is […]

CUDA

Jul, 31

On Binaural Spatialization and the Use of GPGPU for Audio Processing

3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques. We will tackle the problem from three different perspectives. The first one is related to […]

CUDA

•

OpenCL

Jul, 31

Application of the Mean Field Methods to MRF Optimization in Computer Vision

The mean field (MF) methods are an energy optimization method for Markov random fields (MRFs). These methods, which have their root in solid state physics, estimate the marginal density of each site of an MRF graph by iterative computation, similarly to loopy belief propagation (LBP). It appears that, being shadowed by LBP, the MF methods […]

CUDA

Jul, 31

MCMini: Monte Carlo on GPGPU

MCMini is a proof of concept that demonstrates the possibility for Monte Carlo neutron transport using OpenCL with a focus on performance. This implementation, written in C, shows that tracing particles and calculating reactions on a 3D mesh can be done in a highly scalable fashion. These results demonstrate a potential path forward for MCNP […]

OpenCL

Jul, 29

High-Performance Online Spatial and Temporal Aggregations on Multi-core CPUs and Many-Core GPUs

Motivated by the practical needs for efficiently processing large-scale taxi trip data, we have developed techniques for high performance online spatial, temporal and spatiotemporal aggregations. These techniques include timestamp compression to reduce memory footprint, simple linear data structures for efficient in-memory scans and utilization of massively data parallel GPU accelerations for spatial joins. Our experiments […]

CUDA

Jul, 29

Sigma*: Symbolic Learning of Stream Filters

We present Sigma*, a novel technique for learning symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover symbolic input-output steps of the programs and counterexample guided abstraction refinement to over-approximate program behavior, Sigma* transforms arbitrary source representation […]

CUDA

Jul, 29

A Novel GPU Implementation of Eigen Analysis for Risk Management

Portfolio risk is commonly defined as the standard deviation of its return. The empirical correlation matrix of asset returns in a portfolio has its intrinsic noise component. This noise is filtered for more robust performance. Eigendecomposition is a widely used method for noise filtering. Jacobi algorithm has been a popular eigensolver technique due to its […]

CUDA

Jul, 29

Implementation and Evaluation of Recurrence Equation Solvers on GPGPU systems using Rearrangement of Array Configurations

The recurrence equation solver is used in many numerical applications and other general-purpose applications, but it is inherently a sequential algorithm, so it is difficult to implement the parallel program for it. Recently, GPGPU (General Purpose computing on Graphic Processing Unit) attracts a great deal of attention, which is used for generalpurpose computations like numerical […]

CUDA

Jul, 29

Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

One of the difficulties for current GPGPU (General-Purpose computing on Graphics Processing Units) users is writing code to use multiple GPUs. One limiting factor is that only a few GPUs can be attached to a PC, which means that MPI (Message Passing Interface) would be a common tool to use tens or more GPUs. However, […]

CUDA

Jul, 28

Fast Linear Algebra on GPU

GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is the minimal size of primitives being handled, in order to achieve a significant speedup compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

GPU-Accelerated Non-negative Matrix Factorization for Text Mining

accULL: An User-directed Approach to Heterogeneous Programming

Parallel programming on GPU using Intel Array Building Blocks

On Binaural Spatialization and the Use of GPGPU for Audio Processing

Application of the Mean Field Methods to MRF Optimization in Computer Vision

MCMini: Monte Carlo on GPGPU

High-Performance Online Spatial and Temporal Aggregations on Multi-core CPUs and Many-Core GPUs

Sigma*: Symbolic Learning of Stream Filters

A Novel GPU Implementation of Eigen Analysis for Risk Management

Implementation and Evaluation of Recurrence Equation Solvers on GPGPU systems using Rearrangement of Array Configurations

Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

Fast Linear Algebra on GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)