Posts
Jul, 3
Two Stage Data Mining Technique for Fast Monsoon Onset Prediction
The onset of monsoon is eagerly awaited in the Indian sub-continent as it has deep impact in the economic and social domain and hence has been monitored and studied in great depth. With the advent of satellite imagery, it’s now possible to monitor the different parameters which affect or gets affected by the monsoon in […]
Jul, 3
Parallel Processing using FPGAs and GPUs
This report includes use of parallel architectures like that of the Graphic Processing Units (GPU) for general purpose computations. It also includes, filter design using Field Programmable Gate Arrays exploiting its, inherently parallel nature. Implementation of Least Mean Square filters, which is an adaptive filter algorithm, is done using Xilinx Virtex 5 FPGA, and tested […]
Jul, 3
Using OpenCL: Programming Massively Parallel Computers
In 2011 many computer users were exploring the opportunities and the benefits of the massive parallelism offered by heterogeneous computing. In 2000 the Khronos Group, a not-for-profit industry consortium, was founded to create standard open APIs for parallel computing, graphics and dynamic media. Among them has been OpenCL, an open system for programming heterogeneous computers […]
Jul, 3
On the Use of GPUs in Realizing Cost-Effective Distributed RAID
The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and […]
Jul, 2
kANN on the GPU with Shifted Sorting
We describe the implementation of a simple method for finding k approximate nearest neighbors (ANNs) on the GPU. While the performance of most ANN algorithms depends heavily on the distributions of the data and query points, our approach has a very regular data access pattern. It performs as well as state of the art methods […]
Jul, 2
Acceleration of bilateral filtering algorithm for manycore and multicore architectures
This work explores multicore and manycore acceleration for the embarrassingly parallel, compute-intensive bilateral filtering kernel. For manycore architectures, we have created a pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such […]
Jul, 2
Deformation of skeleton based implicit objects
In this paper we present a precise contact modeling environment for skeleton based implicit objects. To render the scene composed of these implicit objects, we have implemented the state-of-the-art raycasting algorithm, called marching points, on GPU using CUDA. Further, we introduce how to interactively deform the implicit objects when they collide. To achieve this we […]
Jul, 2
Halo Gathering Scalability for Large Scale Multi-dimensional Sznajd Opinion Models Using Data Parallelism with GPUs
The Sznajd model of opinion formation exhibits complex phase transitional and growth behaviour and can be studied with numerical simulations on a number of different network structures. Large system sizes and detailed statistical sampling of the model both require data-parallel computing to accelerate simulation performance. Data structures and computational performance issues are reported for simulations […]
Jul, 2
Computationally Efficient Algorithms for Evaluation of Statistical Descriptors
Homogenization methods are becoming the most popular approach to modelling of heterogeneous materials. The main principle is to represent the heterogeneous microstructure with an equivalent homogeneous material. When dealing with the complex random microstructures, the unit cell representing exactly periodic morphology needs to be replaced by a statistically equivalent periodic unit cell (SEPUC) preserving the […]
Jul, 2
API-Compiling for Image Hardware Accelerators
We present an API-based compilation strategy to optimize image applications, developed using a high level image processing library, onto three different image processing hardware accelerators. The library API provides the semantics of the image computations. The three image accelerator targets are quite distinct: the first one uses a vector architecture; the second one presents a […]
Jul, 2
Parallelization Strategies of the Canny Edge Detector for Multi-core CPUs and Many-core GPUs
In this paper we study two parallelization strategies (loop-level parallelism and domain decomposition), and we investigate their impact in terms of performance and scalability on two different parallel architectures. As a test application, we use the Canny Edge Detector due to its wide range of parallelization opportunities, and its frequent use in computer vision applications. […]
Jul, 1
The Fat-Link Computation On Large GPU Clusters for Lattice QCD
Graphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power ef?ciency and low cost. In this paper, we present results of an effort to implement the fatlink computation – an important component of many lattice quantum chromodynamics (LQCD) calculations – on GPU clusters using the QUDA […]