high performance computing on graphics processing units: hgpu.org

Posts

Mar, 5

Fast LZW compression using a GPU

The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bit unsigned integers) into a string of codes using a code table (or dictionary) that maps strings into codes. Since the code […]

CUDA

Mar, 5

Input Space Splitting for OpenCL

The performance of OpenCL programs suffers from memory and control flow divergence. Therefore, OpenCL compilers employ static analyses to identify non-divergent control flow and memory accesses in order to produce faster code. However, divergence is often input-dependent, hence can be observed for some, but not all inputs. In these cases, vectorizing compilers have to generate […]

OpenCL

Mar, 5

Heterogeneous parallel algorithms for Computational Fluid Dynamics on unstructured meshes

Frontiers of computational fluid dynamics (CFD) are constantly expanding and eagerly demanding more computational resources. Currently, we are experiencing an rapid evolution in the high performance computing systems driven by power consumption constraints. New HPC nodes incorporate accelerators that are used as math co-processors for increasing the throughput and the FLOP per watt ratio. On […]

CUDA

•

OpenCL

Mar, 5

Metamorphic Testing for (Graphics) Compilers

We present strategies for metamorphic testing of compilers using opaque value injection, and experiences using the method to test compilers for the OpenGL shading language.

OpenCL

•

OpenGL

Mar, 3

Hierarchical Semantic Parsing for Object Pose Estimation in Densely Cluttered Scenes

Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. […]

CUDA

Mar, 3

Hadoop Mapreduce OpenCL Plugin

Modern systems generates huge amounts of information right from areas like finance, telematics, healthcare, IOT devices to name a few, the modern day computing frameworks like Mapreduce needs an ever increasing amount of computing power to sort, arrange and generate insights from the data. This project is an attempt to harness the power of heterogeneous […]

OpenCL

Mar, 3

Full reconstruction of a 14-qubit state within four hours

Full quantum state tomography (FQST) plays a unique role in the estimation of the state of a quantum system without a priori knowledge or assumptions. Unfortunately, since FQST requires informationally (over)complete measurements, both the number of measurement bases and the computational complexity of data processing suffer an exponential growth with the size of the quantum […]

CUDA

Mar, 3

Heuristics for the Variable Sized Bin Packing Problem Using a Hybrid P-System and CUDA Architecture

The Variable Sized Bin Packing Problem has a wide range of application areas including packing, scheduling, and manufacturing. Given a list of items and variable sized bin types, the objective is to minimize the total size of the used bins. This problem is known to be NP-hard. In this article, we present two new heuristics […]

CUDA

Mar, 3

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their ability to use different representations per layer while maintaining accuracy. This flexibility provides an additional opportunity to improve performance and energy compared to conventional DNN implementations that use a single, uniform representation for all layers throughout the […]

Mar, 2

International Conference on Cloud Computing and Big Data (ICCCBD), 2016

2016 International Conference on Cloud Computing and Big Data (ICCCBD 2016) will be held during July 5-7, 2016 in Chengdu, China, technical sponsored by Sichuan Institue of Electronics and Sichuan Province Computer Federation. Paper Publication All accepted papers must be written in English and will be published into conference proceedings by IEEE. The proceedings will […]

Mar, 1

Alpaka – An Abstraction Library for Parallel Kernel Acceleration

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical […]

CUDA

Mar, 1

DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility

The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data process pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast LZW compression using a GPU

Input Space Splitting for OpenCL

Heterogeneous parallel algorithms for Computational Fluid Dynamics on unstructured meshes

Metamorphic Testing for (Graphics) Compilers

Hierarchical Semantic Parsing for Object Pose Estimation in Densely Cluttered Scenes

Hadoop Mapreduce OpenCL Plugin

Full reconstruction of a 14-qubit state within four hours

Heuristics for the Variable Sized Bin Packing Problem Using a Hybrid P-System and CUDA Architecture

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

International Conference on Cloud Computing and Big Data (ICCCBD), 2016

Alpaka – An Abstraction Library for Parallel Kernel Acceleration

DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)