Posts
Sep, 1
Survey and Benchmarking of Machine Learning Accelerators
Advances in multicore processors and accelerators have opened the flood gates to greater exploration and application of machine learning techniques to a variety of applications. These advances, along with breakdowns of several trends including Moore’s Law, have prompted an explosion of processors and accelerators that promise even greater computational and machine learning capabilities. These processors […]
Aug, 25
Position-Dependent Arrays and Their Application for High Performance Code Generation
Modern parallel hardware promises unprecedented performance, for the gifted few experts who can program it correctly. Code generators from high-level languages provide an attractive alternative, promising to deliver high performance automatically. Existing projects such as Accelerate, Futhark, Halide, or Lift show that this approach is feasible. Unfortunately, existing efforts focus on computations over tensors: regularly […]
Aug, 25
stdgpu: Efficient STL-like Data Structures on the GPU
Tremendous advances in parallel computing and graphics hardware opened up several novel real-time GPU applications in the fields of computer vision, computer graphics as well as augmented reality (AR) and virtual reality (VR). Although these applications built upon established opensource frameworks that provide highly optimized algorithms, they often come with custom self-written data structures to […]
Aug, 25
Automatic Compiler Based FPGA Accelerator for CNN Training
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hard-ware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler-based FPGA accelerator with 16-bit fixed-point precision for complete CNNtraining, […]
Aug, 25
On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-based FPGAs
Graph processing has attracted much attention recently due to its popularity in many big data analytic applications. With high performance and energy efficiency, FPGAs can be an attractive architecture for graph processing. A number of techniques such as caching using block RAMs (BRAMs) to reduce random accesses of global memory and multiple processing element (PE) […]
Aug, 25
Memory-Efficient Object-Oriented Programming on GPUs
Object-oriented programming is often regarded as too inefficient for high-performance computing (HPC), despite the fact that many important HPC problems have an inherent object structure. Our goal is to bring efficient, object-oriented programming to massively parallel SIMD architectures, especially GPUs. In this thesis, we develop various techniques for optimizing object-oriented GPU code. Most notably, we […]
Aug, 21
Survey paper on Deep Learning on GPUs
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU […]
Aug, 18
Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth
Supervised learning is the workhorse for regression and classification tasks, but the standard approach presumes ground truth for every measurement. In real world applications, limitations due to expense or general in-feasibility due to the specific application are common. In the context of agriculture applications, yield monitoring is one such example where simple-physics based measurements such […]
Aug, 18
Efficient Simulation of Fluid Flow and Transport in Heterogeneous Media Using Graphics Processing Units (GPUs)
Networks of interconnected resistors, springs and beams, or pores are standard models of studying scalar and vector transport processes in heterogeneous materials and media, such as fluid flow in porous media, and conduction, deformations, and electric and dielectric breakdown in heterogeneous solids. The computation time and required memory are two limiting factors that hinder the […]
Aug, 18
High Performance Computing via High Level Synthesis
As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High-Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to […]
Aug, 18
Visual Analysis Algorithms for Embedded Systems
The main contribution of this thesis is the design and development of an optimized framework to realize the deep neural classifiers on the embedded platforms. Deep convolutional networks exhibit unmatched performance in image classification. However, these deep classifiers demand huge computational power and memory storage. That is an issue on embedded devices due to limited […]
Aug, 18
SODECL: An Open Source Library for Calculating Multiple Orbits of a System of Stochastic Differential Equations in Parallel
Stochastic differential equations (SDEs) are widely used to model systems affected by random processes. In general, the analysis of an SDE model requires numerical solutions to be generated many times over multiple parameter combinations. However, this process often requires considerable computational resources to be practicable. Due to the embarrassingly parallel nature of the task, devices […]