Posts
May, 21
Architecture-Adaptive Code Variant Tuning
Code variants represent alternative implementations of a computation, and are common in high-performance libraries and applications to facilitate selecting the most appropriate implementation for a specific execution context (target architecture and input dataset). Automating code variant selection typically relies on machine learning to construct a model during an offline learning phase that can be quickly […]
May, 21
GPU-based Pedestrian Detection for Autonomous Driving
Pedestrian detection has gained a lot of prominence during the last few years. Besides the fact that it is one of the hardest tasks within computer vision, it involves huge computational costs. Obtaining acceptable real-time performance, measured in frames per second (fps), for the most advanced algorithms is nowadays a hard challenge. In this work, […]
May, 21
Performance Evaluation of Parallel Count Sort using GPU Computing with CUDA
OBJECTIVE: Sorting is considered a very important application in many areas of computer science. Nowadays parallelization of sorting algorithms using GPU computing, on CUDA hardware is increasing rapidly. The objective behind using GPU computing is that the users can get, the more speedup of the algorithms. METHODS: In this paper, we have focused on count […]
May, 21
Employing Directive Based Compression Solutions on Accelerators Global Memory under OpenACC
Programmers invest extensive development effort to optimize a GPU program to achieve peak performance. Achieving this requires an efficient usage of global memory, and avoiding memory bandwidth underutilization. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this models coarse-grained control on a program can make the memory bandwidth utilization […]
May, 17
GPU-Accelerated Feature Tracking
The motivation of this research is to prove that GPUs can provide significant speedup of long-executing image processing algorithms by way of parallelization and massive data throughput. This thesis accelerates the well-known KLT feature tracking algorithm using OpenCL and an NVidia GeForce GTX 780 GPU. KLT is a fast, efficient and accurate feature tracker but […]
May, 17
DeepLearningKit – an GPU Optimized Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift
In this paper we present DeepLearningKit – an open source framework that supports using pretrained deep learning models (convolutional neural networks) for iOS, OS X and tvOS. DeepLearningKit is developed in Metal in order to utilize the GPU efficiently and Swift for integration with applications, e.g. iOS-based mobile apps on iPhone/iPad, tvOS-based apps for the […]
May, 17
A Foray into Efficient Mapping of Algorithms to Hardware Platforms on Heterogeneous Systems
Heterogeneous computing can potentially offer significant performance and performance per watt improvements over homogeneous computing, but the question "what is the ideal mapping of algorithms to architectures?" remains an open one. In the past couple of years new types of computing devices such as FPGAs have come into general computing use. In this work we […]
May, 17
Attention-based NMT Models as Feature Functions in Phrase-based SMT
This paper describes the AMU-UEDIN submissions to the WMT 2016 shared task on news translation. We explore methods of decode-time integration of attention-based neural translation models with phrase-based statistical machine translation. Efficient batch-algorithms for GPU-querying are proposed and implemented. For English-Russian, the phrase-based system cannot surpass state-of-the-art stand-alone neural models. For the Russian-English task, our […]
May, 17
pyJac: analytical Jacobian generator for chemical kinetics
Accurate simulations of combustion phenomena require the use of detailed chemical kinetics in order to capture limit phenomena such as ignition and extinction as well as predict pollutant formation. However, the chemical kinetic models for hydrocarbon fuels of practical interest typically have large numbers of species and reactions and exhibit high levels of mathematical stiffness […]
May, 11
Improving GPU Performance: Reducing Memory Conflicts and Latency
Over the last decade Graphics Processing Units (GPUs) have evolved from fixed function computer graphics processors to energy efficient and programmable general purpose compute accelerators. During this period the number of cores in a GPU increased from 128 to 3072, an increase of 24x. However, the peak compute performance only increased by 12x, and memory […]
May, 11
LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning
LightNet is a lightweight, versatile and purely Matlab-based deep learning framework. The aim of the design is to provide an easy-to-understand, easy-to-use and efficient computational platform for deep learning research. The implemented framework supports major deep learning architectures such as Multilayer Perceptron Networks (MLP), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The framework […]
May, 11
An End-to-End System for Unconstrained Face Verifcation with Deep Convolutional Neural Networks
Over the last four years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems. This has been made possible due to the availability of large annotated datasets, a better understanding of the non-linear mapping between input images and class labels as well as the affordability […]