Posts
May, 2
GPU accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model
This paper presents an application of GPU accelerators in Earth system modelling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate-chemistry model simulations. We developed a software package that automatically generates CUDA kernels to numerically integrate atmospheric chemical kinetics in the global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC), used […]
May, 2
Speeding up a few orders of magnitude the Jacobi method: high order Chebyshev-Jacobi over GPUs
In this technical note we show how to reach a remarkable speed up when solving elliptic partial differential equations with finite differences thanks to the joint use of the Chebyshev-Jacobi method with high order discretizations and its parallel implementation over GPUs.
May, 2
Accelerating gravitational microlensing simulations using the Xeon Phi coprocessor
Recently Graphics Processing Units (GPUs) have been used to speed up very CPU-intensive gravitational microlensing simulations. In this work, we use the Xeon Phi coprocessor to accelerate such simulations and compare its performance on a microlensing code with that of NVIDIA’s GPUs. For the selected set of parameters evaluated in our experiment, we find that […]
May, 2
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning refers to a set of machine learning techniques that utilize neural networks with many hidden layers for tasks, such as image classification, speech recognition, language understanding. Deep learning has been proven to be very effective in these domains and is pervasively used by many Internet services. In this paper, we describe different automotive […]
Apr, 30
Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to […]
Apr, 30
Automatic source code adaptation for heterogeneous platforms
The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, has given rise to a new range of devices created to improve performance. Heterogeneous Computing (HC), and many-cores are examples of […]
Apr, 30
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
We focus on the challenging task of realtime semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an compressed-PSPNet-based image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. […]
Apr, 30
Accelerating Discrete Wavelet Transforms on Parallel Architectures
The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be […]
Apr, 30
Low-complexity Distributed Tomographic Backprojection for large datasets
In this manuscript we present a fast GPU implementation for tomographic reconstruction of large datasets using data obtained at the Brazilian synchrotron light source. The algorithm is distributed in a cluster with 4 GPUs through a fast pipeline implemented in C programming language. Our algorithm is theoretically based on a recently discovered low complexity formula, […]
Apr, 26
Developing a massive real-time crowd simulation framework on the GPU
Crowd simulations are used to imitate the behaviour of a large group of people. Such simulations are used in industries ranging from video-games to public security. In recent years, research has turned to the parallel nature of GPUs to simulate the behaviour of individuals in a crowd in parallel. This allows for real time visualisation […]
Apr, 26
Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers
The aim of this master’s thesis project was to expand the QPhiX library for twisted-mass fermions with and without clover term. To this end, I continued work initiated by Mario Schrock et al. [63]. In writing this thesis, I was following two main goals. Firstly, I wanted to stress the intricate interplay of the four […]
Apr, 26
A Training Framework and Architectural Design for Distributed Deep Learning
Deep learning has recently gained a lot of attention on account of its incredible success in many complex data-driven applications, such as image classification. However, deep learning is quite user-hostile and is thus difficult to apply. For example, it is tricky and slow to train a large model which may consume a lot of memory. […]