Posts
Aug, 5
Mapping a Guided Image Filter on the HARP Reconfigurable Architecture Using OpenCL
Intel recently introduced the Heterogeneous Architecture Research Platform, HARP. In this platform, the Central Processing Unit and a Field-Programmable Gate Array are connected through a high-bandwidth, low-latency interconnect and both share DRAM memory. For this platform, Open Computing Language (OpenCL), a High-Level Synthesis (HLS) language, is made available. By making use of HLS, a faster […]
Aug, 5
Incremental Bounded Model Checking of Artificial Neural Networks in CUDA
Artificial Neural networks (ANNs) are powerful computing systems employed for various applications due to their versatility to generalize and to respond to unexpected inputs/patterns. However, implementations of ANNs for safety-critical systems might lead to failures, which are hardly predicted in the design phase since ANNs are highly parallel and their parameters are hardly interpretable. Here […]
Aug, 5
A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing
The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized […]
Aug, 5
GLU3.0: Fast GPU-based Parallel Sparse LU Factorization for Circuit Simulation
In this article, we propose a new GPU-based sparse LU factorization method, called GLU3.0, solves the aforementioned problems. First, it introduces a much more efficient double-U dependency detection algorithm to make the detection much simpler. Second, we observe that the potential parallelism is different as the matrix factorization goes on. We then develop three different […]
Jul, 28
FBLAS: Streaming Linear Algebra on FPGA
Energy efficiency is one of the primary concerns when designing large scale computing systems. This makes reconfigurable hardware an attractive alternative to load-store architectures, as it allows eliminating expensive control and data movement overheads in computations. In practice, these devices are often not considered in the high-performance computing community, due to the steep learning curve […]
Jul, 28
A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices
Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, […]
Jul, 28
NNS: The Case For Neural Network-based Sorting
CPU-SIMD/GPU/TPUs will be increasingly powerful. The algorithm using neural network and heterogeneous computing framework will bring significant performance improvement. In this paper we prove a novel neural network-based sorting algorithm, NNS which hold lower time complexity than O(nlogn) and easy implement in heterogeneous framework executed by CPU and GPU. Our initial results show that our […]
Jul, 28
GPU-Accelerated Atari Emulation for Reinforcement Learning
We designed and implemented a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating deep reinforcement algorithms using Atari games. Our CUDA Learning Environment (CuLE) overcomes many limitations of existing CPU-based Atari emulators and scales naturally to multi-GPU systems. It leverages the parallelization capability of GPUs to run thousands of […]
Jul, 28
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we […]
Jul, 26
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing
In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed […]
Jul, 24
Assessing the feasibility of OpenCL CPU implementations for agent-based simulations
Agent-based modeling (ABM) is a bottom-up modeling approach, where each entity of the system being modeled is uniquely represented as a self-determining agent. Large scale emergent behavior in ABMs is population sensitive. As such, it is advisable that the number of agents in a simulation is able to reflect the reality of the system being […]
Jul, 21
Sorting on FPGAs using Merge Trees
Hardware Mergers can be used to implement sorting algorithms on Field-Programmable Gate Arrays (FPGAs) by inductively merging elements as in the Merge Sort algorithm.[1][2] These Hardware Mergers have also been laid out onto the FPGA in a complete binary tree pattern (called a Hardware Merge Tree) which further enhances performance of the sorting procedure by […]