15533

Posts

Mar, 5

Fast LZW compression using a GPU

The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bit unsigned integers) into a string of codes using a code table (or dictionary) that maps strings into codes. Since the code […]
Mar, 5

Input Space Splitting for OpenCL

The performance of OpenCL programs suffers from memory and control flow divergence. Therefore, OpenCL compilers employ static analyses to identify non-divergent control flow and memory accesses in order to produce faster code. However, divergence is often input-dependent, hence can be observed for some, but not all inputs. In these cases, vectorizing compilers have to generate […]
Mar, 5

Heterogeneous parallel algorithms for Computational Fluid Dynamics on unstructured meshes

Frontiers of computational fluid dynamics (CFD) are constantly expanding and eagerly demanding more computational resources. Currently, we are experiencing an rapid evolution in the high performance computing systems driven by power consumption constraints. New HPC nodes incorporate accelerators that are used as math co-processors for increasing the throughput and the FLOP per watt ratio. On […]
Mar, 5

Metamorphic Testing for (Graphics) Compilers

We present strategies for metamorphic testing of compilers using opaque value injection, and experiences using the method to test compilers for the OpenGL shading language.
Mar, 3

Hierarchical Semantic Parsing for Object Pose Estimation in Densely Cluttered Scenes

Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. […]
Mar, 3

Hadoop Mapreduce OpenCL Plugin

Modern systems generates huge amounts of information right from areas like finance, telematics, healthcare, IOT devices to name a few, the modern day computing frameworks like Mapreduce needs an ever increasing amount of computing power to sort, arrange and generate insights from the data. This project is an attempt to harness the power of heterogeneous […]
Mar, 3

Full reconstruction of a 14-qubit state within four hours

Full quantum state tomography (FQST) plays a unique role in the estimation of the state of a quantum system without a priori knowledge or assumptions. Unfortunately, since FQST requires informationally (over)complete measurements, both the number of measurement bases and the computational complexity of data processing suffer an exponential growth with the size of the quantum […]
Mar, 3

Heuristics for the Variable Sized Bin Packing Problem Using a Hybrid P-System and CUDA Architecture

The Variable Sized Bin Packing Problem has a wide range of application areas including packing, scheduling, and manufacturing. Given a list of items and variable sized bin types, the objective is to minimize the total size of the used bins. This problem is known to be NP-hard. In this article, we present two new heuristics […]
Mar, 3

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their ability to use different representations per layer while maintaining accuracy. This flexibility provides an additional opportunity to improve performance and energy compared to conventional DNN implementations that use a single, uniform representation for all layers throughout the […]
Mar, 2

International Conference on Cloud Computing and Big Data (ICCCBD), 2016

2016 International Conference on Cloud Computing and Big Data (ICCCBD 2016) will be held during July 5-7, 2016 in Chengdu, China, technical sponsored by Sichuan Institue of Electronics and Sichuan Province Computer Federation. Paper Publication All accepted papers must be written in English and will be published into conference proceedings by IEEE. The proceedings will […]
Mar, 1

Alpaka – An Abstraction Library for Parallel Kernel Acceleration

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical […]
Mar, 1

DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility

The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data process pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org