Posts
Jul, 5
OpenCL Implementation of a Parallel Universal Kriging Algorithm for Massive Spatial Data Interpolation on Heterogeneous Systems
In some digital Earth engineering applications, spatial interpolation algorithms are required to process and analyze large amounts of data. Due to its powerful computing capacity, heterogeneous computing has been used in many applications for data processing in various fields. In this study, we explore the design and implementation of a parallel universal kriging spatial interpolation […]
Jun, 30
GPRM: a high performance programming framework for manycore processors
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient […]
Jun, 30
Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU
GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism – such as flat or two-level parallelism – and a degree of parallelism that can be statically determined based on the size of the input dataset. However, the effective use of GPUs for algorithms exhibiting complex patterns of parallelism, possibly known only […]
Jun, 30
DeepBE: Learning Deep Binary Encoding for Multi-Label Classification
The track 2 and track 3 of ChaLearn 2016 can be considered as Multi-Label Classification problems. We present a framework of learning deep binary encoding (DeepBE) to deal with multi-label problems by transforming multi-labels to single labels. The transformation of DeepBE is in a hidden pattern, which can be well addressed by deep convolutions neural […]
Jun, 30
Modified Levels of Parallel Odd-Even Transposition Sorting Network (OETSN) with GPU Computing using CUDA
Sorting huge data requires an enormous amount of time. The time needed for this task can be minimised using parallel processing devices like GPU. The odd-even transposition sorting network algorithm is based on the idea that each level uses an equal number of comparators to arrange data. The existing parallel OETSN algorithm compares the elements […]
Jun, 30
Persistent RNNs: Stashing Recurrent Weights On-Chip
This paper introduces a new technique for mapping Deep Recurrent Neural Networks (RNN) efficiently onto GPUs. We show how it is possible to achieve substantially higher computational throughput at low mini-batch sizes than direct implementations of RNNs based on matrix multiplications. The key to our approach is the use of persistent computational kernels that exploit […]
Jun, 28
Parallel and Distributed Deep Learning
The goal of this report is to explore ways to parallelize/distribute deep learning in multi-core and distributed setting. We have analyzed (empirically) the speedup in training a CNN using conventional single core CPU and GPU and provide practical suggestions to improve training times. In the distributed setting, we study and analyze synchronous and asynchronous weight […]
Jun, 28
A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves
The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today’s manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, […]
Jun, 28
Accelerating High-Throughput Computing through OpenCL
As the computational trend diverges from standard CPU computing, to encompass GPUs and other accelerators, the need to integrate these unused resources within existing systems becomes apparent. This paper presents the implementation of a HTCondor pool with GPU execution capabilities through OpenCL. Implementation is discussed from both the system setup and the software design standpoint. […]
Jun, 28
GPU Based Real-Time Welding Simulation with Smoothed-Particle Hydrodynamics
Welding training is essential in the development of industrialization. A good welder will build robust workpieces that ensure the safety and stability of the product. However, training a welder requires lots of time and access professional welding equipment. Therefore, it is desirable to have a training system that is economical and easy to use. After […]
Jun, 28
Parallelizing Map Projection of Raster Data on Multi-core CPU and GPU Parallel Programming Frameworks
Map projections lie at the core of geographic information systems and numerous projections are used today. The reprojection between different map projections is recurring in a geographic information system and it can be parallelized with multi-core CPUs and GPUs. This thesis implements a parallel analytic reprojection algorithm of raster data in C/C++ with the parallel […]
Jun, 22
Efficient and High-quality Sparse Graph Coloring on the GPU
Graph coloring has been broadly used to discover concurrency in parallel computing. To speedup graph coloring for large-scale datasets, parallel algorithms have been proposed to leverage modern GPUs. Existing GPU implementations either have limited performance or yield unsatisfactory coloring quality (too many colors assigned). We present a work-efficient parallel graph coloring implementation on GPUs with […]