Posts
Nov, 20
Recurrent Neural Networks Hardware Implementation on FPGA
Recurrent Neural Networks (RNNs) have the ability to retain memory and learn data sequences, and are a recent breakthrough of machine learning. Due to the recurrent nature of RNNs, it is sometimes hard to parallelize all its computations on conventional hardware. CPUs do not currently offer large parallelism, while GPUs offer limited parallelism due to […]
Nov, 20
Supervised Hashing with Deep Neural Networks
In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively "shallow" networks limited by the issues arising in back propagation (vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of […]
Nov, 20
Large Scale Artificial Neural Network Training Using Multi-GPUs
This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix multiplication and integrate the algorithm with the ANN training. The experiments demonstrate that our matrix multiplication algorithm achieves linear speedup on multiple inhomogeneous […]
Nov, 20
GPU-Based Inverse Rendering With Multi-Objective Particle Swarm Optimization
We present a novel, GPU-accelerated per-pixel inverse rendering (IR) optimization algorithm based on Particle Swarm Optimization (PSO), IRPSO. IRPSO estimates the per-pixel scene attributes including reflectance properties of a 3D model, and is fast enough to do in situ visualization of the optimization in real-time. We utilize the GPU framebuffer as a computational domain, where […]
Nov, 20
GPU-accelerated adjoint algorithmic differentiation
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store […]
Nov, 13
Fast Neuromimetic Object Recognition using FPGA Outperforms GPU Implementations
Recognition of objects in still images has traditionally been regarded as a difficult computational problem. Although modern automated methods for visual object recognition have achieved steadily increasing recognition accuracy, even the most advanced computational vision approaches are unable to obtain performance equal to that of humans. This has led to the creation of many biologically-inspired […]
Nov, 13
GEMMbench: a framework for reproducible and collaborative benchmarking of matrix multiplication
The generic matrix-matrix multiplication (GEMM) is arguably the most popular computational kernel of the 20th century. Yet, surprisingly, no common methodology for evaluating GEMM performance has been established over the many decades of using GEMM for comparing architectures, compilers and ninja-class programmers. We introduce GEMMbench, a framework and methodology for evaluating performance of GEMM implementations. […]
Nov, 13
Accelerating Recommender Systems using GPUs
We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the GPU are better than the results of the multi-core versions (maximum speedup of 14.8).
Nov, 13
Accelerating Adaptive IDW Interpolation Algorithm on a Single GPU
This paper focuses on the design and implementing of GPU-accelerated Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm. The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the spatial points distribution pattern and achieve more accurate predictions than those by IDW. In this paper, we first […]
Nov, 13
A Survey Of Techniques for Architecting and Managing Asymmetric Multicore Processors
To meet the needs of diverse range of workloads, asymmetric multicore processors (AMPs) have been proposed, which feature cores of different microarchitecture or ISAs. However, given the diversity inherent in their design and application scenarios, several challenges need to be addressed to effectively architect AMPs and leverage their potential in optimizing both sequential and parallel […]
Nov, 12
FIESTA 4: optimized Feynman integral calculations with GPU support
This paper presents a new major release of the program FIESTA (Feynman Integral Evaluation by a Sector decomposiTion Approach). The new release is mainly aimed at optimal performance at large scales when one is increasing the number of sampling points in order to reduce the uncertainty estimates. The release now supports graphical processor units (GPU) […]
Nov, 12
Microlensing Observations Rapid Search for Exoplanets: MORSE code for GPUs
The rapid analysis of ongoing gravitational microlensing events has been integral to the successful detection and characterisation of cool planets orbiting low mass stars in the Galaxy. In this paper we present an implementation of search and fit techniques on Graphical Processing Unit hardware. The method allows for the rapid identification of candidate planetary microlensing […]

