high performance computing on graphics processing units: hgpu.org

Posts

Aug, 28

A Research of MapReduce with GPU Acceleration

MapReduce is an efficient distributed computing model on large data sets. The data processing is fully distributed on huge amount of nodes, and a MapReduce cluster is of highly scalable. However, single-node performance is gradually to be a bottleneck in computeintensive jobs, which makes it difficult to extend the MapReduce model to wider application fields […]

OpenCL

Aug, 27

Perceptually Optimized Real-Time Computer Graphics

Perceptual optimization, the application of human visual perception models to remove imperceptible components in a graphics system, has been proven effective in achieving significant computational speedup. Previous implementations of this technique have focused on spatial level of detail reduction, which typically results in noticeable degradation of image quality. This thesis introduces refresh rate modulation (RRM), […]

OpenCL

Aug, 21

GPU-Accelerated Light Stemmer for the Arabic Language

Preprocessing of data is a vital aspect in information retrieval. Stemming is a major preprocessing task. The goal of stemming is to reduce the inflectional and some of the derivational forms of a word to its base form. Dealing with the massive amounts of data on the web, preprocessing generally consumes a major portion of […]

OpenCL

Aug, 18

Fractals Image Rendering and Compression using GPUs

Fractal image compression provides immense advantages as compared to conventional image compressions. Though the fractal image encoding time is comparatively quite high as compared to the conventional ones but the decoding time is far less and almost instantaneous. Besides, fractal images are resolution-independent, implying that these images will render the same intensity and quality even […]

OpenCL

Aug, 17

Rootbeer: Seamlessly using GPUs from Java

When converting a serial program to a parallel program that can run on a Graphics Processing Unit (GPU) the developer must choose what functions will run on the GPU. For each function the developer chooses, he or she needs to manually write code to: 1) serialize state to GPU memory, 2) define the kernel code […]

CUDA

•

OpenCL

Jul, 31

accULL: An User-directed Approach to Heterogeneous Programming

The world of HPC is undergoing rapid changes and computer architectures capable to achieve high performance have broadened. The irruption in the scene of computational accelerators, like GPUs, is increasing performance while maintaining low cost per GFLOP, thus expanding the popularity of HPC. However, it is still difficult to exploit the new complex processor hierarchies. […]

CUDA

•

OpenCL

Jul, 31

MCMini: Monte Carlo on GPGPU

MCMini is a proof of concept that demonstrates the possibility for Monte Carlo neutron transport using OpenCL with a focus on performance. This implementation, written in C, shows that tracing particles and calculating reactions on a 3D mesh can be done in a highly scalable fashion. These results demonstrate a potential path forward for MCNP […]

OpenCL

Jul, 28

Fast Linear Algebra on GPU

GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is the minimal size of primitives being handled, in order to achieve a significant speedup compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching […]

OpenCL

Jul, 28

Ensemble K-Means on Modern Many Core Hardware

Clustering involves partitioning a set of objects into subsets called clusters so that objects in the same cluster are similar according to some metric. Clustering is widely used in many fields like machine learning, data mining, pattern recognition and bioinformatics. K-means algorithm is the most popular algorithm used for clustering which uses distance as the […]

OpenCL

Jul, 23

LBCL: multi-device automatic load balancing

This paper presents the Load Balancing for OpenCL (lbcl) library, devoted to automatically solve load balancing issues on both multi-platform and heterogeneous environments. Using this library, a single kernel can be executed on a set of heterogeneous devices, giving each device an amount of work proportional to its computing power. A wrapper has been developed […]

OpenCL

Jul, 23

A Comparative Study of OpenACC Implementations

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, […]

CUDA

•

OpenCL

Jul, 20

Scaling CUDA for Distributed Heterogeneous Processors

The mainstream acceptance of heterogeneous computing and cloud computing is prompting a future of distributed heterogeneous systems. With current software development tools, programming such complex systems is difficult and requires an extensive knowledge of network and processor architectures. Providing an abstraction of the underlying network, message-passing interface (MPI) has been the standard tool for developing […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

A Research of MapReduce with GPU Acceleration

Perceptually Optimized Real-Time Computer Graphics

GPU-Accelerated Light Stemmer for the Arabic Language

Fractals Image Rendering and Compression using GPUs

Rootbeer: Seamlessly using GPUs from Java

accULL: An User-directed Approach to Heterogeneous Programming

MCMini: Monte Carlo on GPGPU

Fast Linear Algebra on GPU

Ensemble K-Means on Modern Many Core Hardware

LBCL: multi-device automatic load balancing

A Comparative Study of OpenACC Implementations

Scaling CUDA for Distributed Heterogeneous Processors

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)