## Posts

Nov, 25

### GPU-based Acceleration of Deep Convolutional Neural Networks on Mobile Platforms

Mobile applications running on wearable devices and smartphones can greatly benefit from accurate and scalable deep CNN-based machine learning algorithms. While mobile CPU performance does not match the intensive computational requirement of deep CNNs, the embedded GPU which already exists in many mobile platforms can be leveraged for acceleration of CNN computations on the local […]

Nov, 25

### Pulsar Acceleration Searches on the GPU for the Square Kilometre Array

Pulsar acceleration searches are methods for recovering signals from radio telescopes, that may otherwise be lost due to the effect of orbital acceleration in binary systems. The vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the […]

Nov, 24

### Learning Representation for Scene Understanding: Epitomes, CRFs, and CNNs

Scene understanding, such as image classification and semantic image segmentation, has been a challenging problem in computer vision. The difficulties mainly come from the feature representation, i.e., how to find a good representation for images. Instead of improving over hand-crafted features such as SIFT or HoG, we focus on learning image representations by generative and […]

Nov, 24

### A parallel algorithm for the constrained shortest path problem on lattice graphs

We present a parallel algorithm for finding the shortest path whose total weight is smaller than a pre-determined value. The passage times over the edges are assumed to be positive integers. In each step the processing elements are not analyzing the entire graph. Instead they are focusing on a subset of vertices called active vertices. […]

Nov, 24

### Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Although the latest high-end smartphone has powerful CPU and GPU, running deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet classification on mobile devices is challenging. To deploy deep CNNs on mobile devices, we present a simple and effective scheme to compress the entire CNN, which we call one-shot whole network compression. The […]

Nov, 24

### Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning

Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative study of four deep learning frameworks, namely Caffe, Neon, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed. The study is performed […]

Nov, 24

### Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers […]

Nov, 20

### Large Scale Artificial Neural Network Training Using Multi-GPUs

This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix multiplication and integrate the algorithm with the ANN training. The experiments demonstrate that our matrix multiplication algorithm achieves linear speedup on multiple inhomogeneous […]

Nov, 20

### Recurrent Neural Networks Hardware Implementation on FPGA

Recurrent Neural Networks (RNNs) have the ability to retain memory and learn data sequences, and are a recent breakthrough of machine learning. Due to the recurrent nature of RNNs, it is sometimes hard to parallelize all its computations on conventional hardware. CPUs do not currently offer large parallelism, while GPUs offer limited parallelism due to […]

Nov, 20

### Supervised Hashing with Deep Neural Networks

In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively "shallow" networks limited by the issues arising in back propagation (vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of […]

Nov, 20

### GPU-accelerated adjoint algorithmic differentiation

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store […]

Nov, 20

### GPU-Based Inverse Rendering With Multi-Objective Particle Swarm Optimization

We present a novel, GPU-accelerated per-pixel inverse rendering (IR) optimization algorithm based on Particle Swarm Optimization (PSO), IRPSO. IRPSO estimates the per-pixel scene attributes including reflectance properties of a 3D model, and is fast enough to do in situ visualization of the optimization in real-time. We utilize the GPU framebuffer as a computational domain, where […]