19047

Posts

Aug, 21

Survey paper on Deep Learning on GPUs

The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU […]
Aug, 18

Efficient Simulation of Fluid Flow and Transport in Heterogeneous Media Using Graphics Processing Units (GPUs)

Networks of interconnected resistors, springs and beams, or pores are standard models of studying scalar and vector transport processes in heterogeneous materials and media, such as fluid flow in porous media, and conduction, deformations, and electric and dielectric breakdown in heterogeneous solids. The computation time and required memory are two limiting factors that hinder the […]
Aug, 18

Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth

Supervised learning is the workhorse for regression and classification tasks, but the standard approach presumes ground truth for every measurement. In real world applications, limitations due to expense or general in-feasibility due to the specific application are common. In the context of agriculture applications, yield monitoring is one such example where simple-physics based measurements such […]
Aug, 18

High Performance Computing via High Level Synthesis

As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High-Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to […]
Aug, 18

Visual Analysis Algorithms for Embedded Systems

The main contribution of this thesis is the design and development of an optimized framework to realize the deep neural classifiers on the embedded platforms. Deep convolutional networks exhibit unmatched performance in image classification. However, these deep classifiers demand huge computational power and memory storage. That is an issue on embedded devices due to limited […]
Aug, 18

SODECL: An Open Source Library for Calculating Multiple Orbits of a System of Stochastic Differential Equations in Parallel

Stochastic differential equations (SDEs) are widely used to model systems affected by random processes. In general, the analysis of an SDE model requires numerical solutions to be generated many times over multiple parameter combinations. However, this process often requires considerable computational resources to be practicable. Due to the embarrassingly parallel nature of the task, devices […]
Aug, 11

Simple Iterative Incompressible Smoothed Particle Hydrodynamics

In this paper a simple, robust, and general purpose approach to implement the Incompressible Smoothed Particle Hydrodynamics (ISPH) method is proposed. The new approach is well suited for implementation on CPUs and GPUs. The method is matrix-free and uses an iterative formulation to setup and solve the pressure Poisson equation. A novel approach is used […]
Aug, 11

A Deep Learning Approach for Automatic Code Optimization in the Tiramisu Compiler

Modern compilers offer more and more code optimization possibilities. This enables better use of sophisticated hardware architectures and available resources in order to accelerate programs. It is difficult to predict which optimizations will be beneficial for a given program, as it depends on the program, the execution environment, interaction with other optimizations, and other factors. […]
Aug, 11

Live Migration of FPGA Applications

With the recent and growing trend of Field Programmable Gate Arrays (FPGAs) being deployed into the data centers, cloud computing service providers are finding it difficult to manage these devices efficiently because traditional server management concepts and techniques are not yet available for FPGAs. In this thesis, we explore how to bring one of these […]
Aug, 11

Performance Comparison for Neuroscience Application Benchmarks

Researchers within the Human Brain Project and related projects have in the last couple of years expanded their needs for high-performance computing infrastructures. The needs arise from a diverse set of science challenges that range from large-scale simulations of brain models to processing of extreme-scale experimental data sets. The ICEI project, which is in the […]
Aug, 11

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs, because of three challenges: (1) difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic ratio. To address these challenges, GraphBLAS is an innovative, on-going effort by the […]
Aug, 5

Parallelization of Coherent Point Drift for patient registration

Point set registration is a central part in any application where the correspondence between two data point sets is of interest, for instance patient data from medical examinations. There exists numerous different algorithms that aim at solving the registration problem, and one of which is the Coherent Point Drift (CPD) algorithm. In this thesis a […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: