Posts
Oct, 4
Deep Dynamic Neural Networks for Gesture Segmentation and Recognition
The purpose of this paper is to describe a novel method called Deep Dynamic Neural Networks(DDNN) for the Track 3 of the Chalearn Looking at People 2014 challenge [1]. A generalised semi-supervised hierarchical dynamic framework is proposed for simultaneous gesture segmentation and recognition taking both skeleton and depth images as input modules. First, Deep Belief […]
Oct, 4
GPU Accelerated Radio Wave Propagation Modeling Using Ray Tracing
Radar producers, which are mostly in defense industry, need radar environment simulator to test their products during the development. Such a simulator helps them to be able to get rid of costly field tests. For developing a radar environment simulator, radio wave propagation should be modeled. However, this is a computationally expensive and time consuming […]
Oct, 4
Parallel Shortest Path Algorithm for Voronoi Diagrams with Generalized Distance Functions
Voronoi diagrams are fundamental data structures in computational geometry with applications on different areas. Recent soft object simulation algorithms for real time physics engines require the computation of Voronoi diagrams over 3D images with non-Euclidean distances. In this case, the computation must be performed over a graph, where the edges encode the required distance information. […]
Oct, 4
Teaching Parallel Programming Using Java
This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that […]
Oct, 4
A massively parallel algorithm for constructing the BWT of large string sets
We present a new scalable, lightweight algorithm to incrementally construct the BWT and FM-index of large string sets such as those produced by Next Generation Sequencing. The algorithm is designed for massive parallelism and can effectively exploit the combination of low capacity high bandwidth memory and slower external system memory typical of GPU accelerated systems. […]
Oct, 3
Fast Automatic Heuristic Construction Using Active Learning
Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data. However, obtaining this data can take months per platform. This is becoming an ever more critical problem and if no solution […]
Oct, 3
Microarchitectural Performance Characterization of Irregular GPU Kernels
GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of […]
Oct, 3
Stable large-scale solver for Ginzburg-Landau equations for superconductors
Understanding the interaction of vortices with inclusions in type-II superconductors is a major outstanding challenge both for fundamental science and energy applications. At application-relevant scales, the long-range interactions between a dense configuration of vortices and the dependence of their behavior on external parameters, such as temperature and an applied magnetic field, are all important to […]
Oct, 3
A fast GPU-based Monte Carlo simulation of proton transport with detailed modeling of non-elastic interactions
Purpose: Very fast Monte Carlo (MC) simulations of proton transport have been implemented recently on GPUs. However, these usually use simplified models for non-elastic (NE) proton-nucleus interactions. Our primary goal is to build a GPU-based proton transport MC with detailed modeling of elastic and NE collisions. Methods: Using CUDA, we implemented GPU kernels for these […]
Oct, 3
A stencil-based implementation of Parareal in the C++ domain specific embedded language STELLA
In view of the rapid rise of the number of cores in modern supercomputers, time-parallel methods that introduce concurrency along the temporal axis are becoming increasingly popular. For the solution of time-dependent partial differential equations, these methods can add another direction for concurrency on top of spatial parallelization. The paper presents an implementation of the […]
Oct, 2
Fast Estimation of Gaussian Mixture Model Parameters on GPU using CUDA
Gaussian Mixture Models (GMMs) are widely used among scientists e.g. in statistics toolkits and data mining procedures. In order to estimate parameters of a GMM the Maximum Likelihood (ML) training is often utilized, more precisely the Expectation-Maximization (EM) algorithm. Nowadays, a lot of tasks works with huge datasets, what makes the estimation process time consuming […]
Sep, 30
Parallel QuadTree Encoding of Large-Scale Raster Geospatial Data on Multicore CPUs and GPGPUs
Global remote sensing and large-scale environment modeling have generated vast amounts of raster geospatial images. To gain a better understanding of this data, researchers are interested in performing spatial queries over them, and the computation of those queries’ results is greatly facilitated by the existence of spatial indices. Additionally, though there have been major advances […]