19063

Posts

Aug, 25

stdgpu: Efficient STL-like Data Structures on the GPU

Tremendous advances in parallel computing and graphics hardware opened up several novel real-time GPU applications in the fields of computer vision, computer graphics as well as augmented reality (AR) and virtual reality (VR). Although these applications built upon established opensource frameworks that provide highly optimized algorithms, they often come with custom self-written data structures to […]
Aug, 25

Automatic Compiler Based FPGA Accelerator for CNN Training

Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hard-ware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler-based FPGA accelerator with 16-bit fixed-point precision for complete CNNtraining, […]
Aug, 25

Memory-Efficient Object-Oriented Programming on GPUs

Object-oriented programming is often regarded as too inefficient for high-performance computing (HPC), despite the fact that many important HPC problems have an inherent object structure. Our goal is to bring efficient, object-oriented programming to massively parallel SIMD architectures, especially GPUs. In this thesis, we develop various techniques for optimizing object-oriented GPU code. Most notably, we […]
Aug, 25

On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-based FPGAs

Graph processing has attracted much attention recently due to its popularity in many big data analytic applications. With high performance and energy efficiency, FPGAs can be an attractive architecture for graph processing. A number of techniques such as caching using block RAMs (BRAMs) to reduce random accesses of global memory and multiple processing element (PE) […]
Aug, 21

Survey paper on Deep Learning on GPUs

The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU […]
Aug, 18

Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth

Supervised learning is the workhorse for regression and classification tasks, but the standard approach presumes ground truth for every measurement. In real world applications, limitations due to expense or general in-feasibility due to the specific application are common. In the context of agriculture applications, yield monitoring is one such example where simple-physics based measurements such […]
Aug, 18

Efficient Simulation of Fluid Flow and Transport in Heterogeneous Media Using Graphics Processing Units (GPUs)

Networks of interconnected resistors, springs and beams, or pores are standard models of studying scalar and vector transport processes in heterogeneous materials and media, such as fluid flow in porous media, and conduction, deformations, and electric and dielectric breakdown in heterogeneous solids. The computation time and required memory are two limiting factors that hinder the […]
Aug, 18

High Performance Computing via High Level Synthesis

As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High-Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to […]
Aug, 18

Visual Analysis Algorithms for Embedded Systems

The main contribution of this thesis is the design and development of an optimized framework to realize the deep neural classifiers on the embedded platforms. Deep convolutional networks exhibit unmatched performance in image classification. However, these deep classifiers demand huge computational power and memory storage. That is an issue on embedded devices due to limited […]
Aug, 18

SODECL: An Open Source Library for Calculating Multiple Orbits of a System of Stochastic Differential Equations in Parallel

Stochastic differential equations (SDEs) are widely used to model systems affected by random processes. In general, the analysis of an SDE model requires numerical solutions to be generated many times over multiple parameter combinations. However, this process often requires considerable computational resources to be practicable. Due to the embarrassingly parallel nature of the task, devices […]
Aug, 11

Simple Iterative Incompressible Smoothed Particle Hydrodynamics

In this paper a simple, robust, and general purpose approach to implement the Incompressible Smoothed Particle Hydrodynamics (ISPH) method is proposed. The new approach is well suited for implementation on CPUs and GPUs. The method is matrix-free and uses an iterative formulation to setup and solve the pressure Poisson equation. A novel approach is used […]
Aug, 11

A Deep Learning Approach for Automatic Code Optimization in the Tiramisu Compiler

Modern compilers offer more and more code optimization possibilities. This enables better use of sophisticated hardware architectures and available resources in order to accelerate programs. It is difficult to predict which optimizations will be beneficial for a given program, as it depends on the program, the execution environment, interaction with other optimizations, and other factors. […]

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: