Posts
Jul, 5
IEEE 2nd International Conference on Control and Robotics Engineering (ICCRE), 2017
Publication: All submissions will be peer reviewed, and all the accepted papers will be published in the ICCRE 2017 conference Proceedings, and reviewed by the IEEE Conference Publication Program for IEEE Xplore and Ei Compendex. ICCRE 2016 have been included in the IEEE Xplore! Conference Schedule: April 1, 2017: Registration and Collecting conference materials April […]
Jul, 5
Accelerated cryo-EM structure determination with parallelisation using GPUs in relion-2
By reaching near-atomic resolution for a wide range of specimens, single-particle cryo-EM structure determination is transforming structural biology. However, the necessary calculations come at increased computational costs, introducing a bottleneck that is currently limiting throughput and the development of new methods. Here, we present an implementation of the relion image processing software that uses graphics […]
Jul, 5
An Adaptive Multi-Spline Refinement Algorithm in Simulation Based Sailboat Trajectory Optimization Using Onboard Multi-Core Computer Systems
A new dynamic programming based parallel algorithm adapted to on-board heterogeneous computers for simulation based trajectory optimization is studied in the context of "high-performance sailing". The algorithm uses a new discrete space of continuously differentiable functions called the multi-splines as its search space representation. A basic version of the algorithm is presented in detail (pseudo-code, […]
Jul, 5
A Test Drive of the NVIDIA Jetson TX1 Developer Kit for Deep Learning and Computer Vision Applications
The Jetson TX1 module is NVIDIA’s latest processor system-on-module for embedded applications, based on the Tegra X1 chip. The Jetson TX1 Developer Kit is a low-cost, feature-rich development kit based on the Jetson TX1 module. BDTI, a technology analysis firm, used the Jetson TX1 Developer Kit to develop a deep-learning-based computer vision application-a camera that […]
Jul, 5
Time Predictability of GPU Kernel on an HSA Compliant Platform
During recent years, the importance of utilizing more computational power in smaller computer systems has increased. The utilization of more computational power in smaller packages, the ability to combine more than one type of processor unit has become more popular in the industry. By combining, one achieves more power efficiency as well as gain more […]
Jul, 5
OpenCL Implementation of a Parallel Universal Kriging Algorithm for Massive Spatial Data Interpolation on Heterogeneous Systems
In some digital Earth engineering applications, spatial interpolation algorithms are required to process and analyze large amounts of data. Due to its powerful computing capacity, heterogeneous computing has been used in many applications for data processing in various fields. In this study, we explore the design and implementation of a parallel universal kriging spatial interpolation […]
Jun, 30
GPRM: a high performance programming framework for manycore processors
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient […]
Jun, 30
Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU
GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism – such as flat or two-level parallelism – and a degree of parallelism that can be statically determined based on the size of the input dataset. However, the effective use of GPUs for algorithms exhibiting complex patterns of parallelism, possibly known only […]
Jun, 30
DeepBE: Learning Deep Binary Encoding for Multi-Label Classification
The track 2 and track 3 of ChaLearn 2016 can be considered as Multi-Label Classification problems. We present a framework of learning deep binary encoding (DeepBE) to deal with multi-label problems by transforming multi-labels to single labels. The transformation of DeepBE is in a hidden pattern, which can be well addressed by deep convolutions neural […]
Jun, 30
Modified Levels of Parallel Odd-Even Transposition Sorting Network (OETSN) with GPU Computing using CUDA
Sorting huge data requires an enormous amount of time. The time needed for this task can be minimised using parallel processing devices like GPU. The odd-even transposition sorting network algorithm is based on the idea that each level uses an equal number of comparators to arrange data. The existing parallel OETSN algorithm compares the elements […]
Jun, 30
Persistent RNNs: Stashing Recurrent Weights On-Chip
This paper introduces a new technique for mapping Deep Recurrent Neural Networks (RNN) efficiently onto GPUs. We show how it is possible to achieve substantially higher computational throughput at low mini-batch sizes than direct implementations of RNNs based on matrix multiplications. The key to our approach is the use of persistent computational kernels that exploit […]
Jun, 28
Parallel and Distributed Deep Learning
The goal of this report is to explore ways to parallelize/distribute deep learning in multi-core and distributed setting. We have analyzed (empirically) the speedup in training a CNN using conventional single core CPU and GPU and provide practical suggestions to improve training times. In the distributed setting, we study and analyze synchronous and asynchronous weight […]