18875

Posts

May, 5

Principles, Techniques, and Tools for Explicit and Automatic Parallelization

The end of Dennard scaling also brought an end to frequency scaling as a means to improve performance. Chip manufacturers had to abandon frequency and superscalar scaling as processors became increasingly power constrained. An architecture’s power budget became the limiting factor to performance gains, and computations had to be performed more energy-efficiently. Designers turned to […]
May, 5

Compressed Learning of Deep Neural Networks for OpenCL-Capable Embedded Systems

Deep neural networks (DNNs) have been quite successful in solving many complex learning problems. However, DNNs tend to have a large number of learning parameters, leading to a large memory and computation requirement. In this paper, we propose a model compression framework for efficient training and inference of deep neural networks on embedded systems. Our […]
May, 5

An Architectural Journey into RISC Architectures for HPC Workloads

The race to the Exascale (i.e., 10^18 Floating Point operations per seconds) together with the slow-down of Moore’s law are posing unprecedented challenges to the whole High-Performance Computing (HPC) community. Computer architects, system integrators and software engineers studying programming models for handling parallelism are especially called to the rescue in a moment like the one […]
May, 5

Full-stack Optimization for Accelerating CNNs with FPGA Validation

We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference latency, energy efficiency, hardware utilization and inference accuracy. […]
May, 5

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, […]
May, 1

Evaluating the Arm Ecosystem for High Performance Computing

In recent years, Arm-based processors have arrived on the HPC scene, offering an alternative the existing status quo, which was largely dominated by x86 processors. In this paper, we evaluate the Arm ecosystem, both the hardware offering and the software stack that is available to users, by benchmarking a production HPC platform that uses Marvell’s […]
May, 1

Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL

Multi-accelerator platforms combine CPUs and different accelerator architectures within a single compute node. Such systems are capable of processing parallel workloads very efficiently while being more energy efficient than regular systems consisting of CPUs only. However, the architectures of such systems are diverse, forcing developers to port applications to each accelerator using different programming languages, […]
May, 1

The Risks of WebGL: Analysis, Evaluation and Detection

WebGL is a browser feature that enables JavaScript-based control of the graphics processing unit (GPU) to render interactive 3D and 2D graphics, without the use of plug-ins. Exploiting WebGL for attacks will affect billions of users since browsers serve as the main interaction mechanism with the world wide web. This paper explores the potential threats […]
Apr, 28

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance, parallel applications. In today’s increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs. […]
Apr, 28

Chunkflow: Distributed Hybrid Cloud Processing of Large 3D Images by Convolutional Nets

It is now common to process volumetric biomedical images using 3D Convolutional Networks (ConvNets). This can be challenging for the teravoxel and even petavoxel images that are being acquired today by light or electron microscopy. Here we introduce chunkflow, a software framework for distributing ConvNet processing over local and cloud GPUs and CPUs. The image […]
Apr, 28

Wasserstein-Fisher-Rao Document Distance

As a fundamental problem of natural language processing, it is important to measure the distance between different documents. Among the existing methods, the Word Mover’s Distance (WMD) has shown remarkable success in document semantic matching for its clear physical insight as a parameter-free model. However, WMD is essentially based on the classical Wasserstein metric, thus […]
Apr, 28

GPU-based Efficient Join Algorithms on Hadoop

The growing data has brought tremendous pressure for query processing and storage, so there are many studies that focus on using GPU to accelerate join operation, which is one of the most important operations in modern database systems. However, existing GPU acceleration join operation researches are not very suitable for the join operation on big […]

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: