17209

Posts

May, 2

Deep Learning in the Automotive Industry: Applications and Tools

Deep Learning refers to a set of machine learning techniques that utilize neural networks with many hidden layers for tasks, such as image classification, speech recognition, language understanding. Deep learning has been proven to be very effective in these domains and is pervasively used by many Internet services. In this paper, we describe different automotive […]
Apr, 30

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to […]
Apr, 30

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

We focus on the challenging task of realtime semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an compressed-PSPNet-based image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. […]
Apr, 30

Automatic source code adaptation for heterogeneous platforms

The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, has given rise to a new range of devices created to improve performance. Heterogeneous Computing (HC), and many-cores are examples of […]
Apr, 30

Accelerating Discrete Wavelet Transforms on Parallel Architectures

The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be […]
Apr, 30

Low-complexity Distributed Tomographic Backprojection for large datasets

In this manuscript we present a fast GPU implementation for tomographic reconstruction of large datasets using data obtained at the Brazilian synchrotron light source. The algorithm is distributed in a cluster with 4 GPUs through a fast pipeline implemented in C programming language. Our algorithm is theoretically based on a recently discovered low complexity formula, […]
Apr, 26

Developing a massive real-time crowd simulation framework on the GPU

Crowd simulations are used to imitate the behaviour of a large group of people. Such simulations are used in industries ranging from video-games to public security. In recent years, research has turned to the parallel nature of GPUs to simulate the behaviour of individuals in a crowd in parallel. This allows for real time visualisation […]
Apr, 26

Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers

The aim of this master’s thesis project was to expand the QPhiX library for twisted-mass fermions with and without clover term. To this end, I continued work initiated by Mario Schrock et al. [63]. In writing this thesis, I was following two main goals. Firstly, I wanted to stress the intricate interplay of the four […]
Apr, 26

A Training Framework and Architectural Design for Distributed Deep Learning

Deep learning has recently gained a lot of attention on account of its incredible success in many complex data-driven applications, such as image classification. However, deep learning is quite user-hostile and is thus difficult to apply. For example, it is tricky and slow to train a large model which may consume a lot of memory. […]
Apr, 26

OpenCL-Based FPGA Accelerator for 3D FDTD with Periodic and Absorbing Boundary Conditions

Finite difference time domain (FDTD) method is a very poplar way of numerically solving partial differential equations. FDTD has a low operational intensity so that the performances in CPUs and GPUs are often restricted by the memory bandwidth. Recently, deeply pipelined FPGA accelerators have shown a lot of success by exploiting streaming data flows in […]
Apr, 26

OpenCL JIT Compilation for Dynamic Programming Languages

Graphics Processor Units (GPUs) are powerful hardware to parallelize and speed-up applications. However, programming these devices is too complex for most users and the existing standards for GPU programming are available only for low-level languages such as C. Dynamic programming languages offer higher abstractions and functionality for many users. GPU programming is possible for dynamic […]
Apr, 23

4th International Conference on Biomedical and Bioinformatics Engineering (ICBBE), 2017

ICBBE 2017 is to bring together innovative academics and industrial experts in the field of Biomedical and Bioinformatics Engineering to a common forum. The primary goal of the conference is to promote research and developmental activities in Biomedical and Bioinformatics Engineering. Another goal is to promote scientific information interchange between researchers, developers, engineers, students, and […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org