Posts
Apr, 23
Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms
Sparse Matrix Vector Multiplication (SpMV) is an important computational kernel in many scientific computing applications. Pipelining multiply-accumulate operations shifts SpMV from a computationally bounded kernel to an I/O bounded kernel. In this paper, we propose a design methodology and hardware architecture for SpMV that seeks to utilize system memory bandwidth as efficiently as possible, by […]
Apr, 23
Parallel Computing for Accelerated Texture Classification with Local Binary Pattern Descriptors using OpenCL
In this paper, a novel parallelized implementation of rotation invariant texture classification using Heterogeneous Computing Platforms like CPU and Graphics Processing Unit (GPU) is proposed. A complete modeling of the LBP operator as well as its improvised versions of Complete Local Binary Patterns (CLBP) and Multi-scale Local Binary Patterns (MLBP) has been developed on a […]
Apr, 23
Solving Wave Equations on Unstructured Geometries
Waves are all around us – be it in the form of sound, electromagnetic radiation, water waves, or earthquakes. Their study is an important basic tool across engineering and science disciplines. Every wave solver serving the computational study of waves meets a trade-off of two figures of merit–its computational speed and its accuracy. Discontinuous Galerkin […]
Apr, 23
GPU Scripting and Code Generation with PyCUDA
High-level scripting languages are in many ways polar opposites to GPUs. GPUs are highly parallel, subject to hardware subtleties, and designed for maximum throughput, and they offer a tremendous advance in the performance achievable for a significant number of computational problems. On the other hand, scripting languages such as Python favor ease of use over […]
Apr, 23
SW# – GPU enabled exact alignments on genome scale
Sequence alignment is one of the oldest and the most famous problems in bioinformatics. Even after 45 years, for one reason or another, this problem is still actual; current solutions are trade-offs between execution time, memory consumption and accuracy. We purpose SW#, a new CUDA GPU enabled and memory efficient implementation of dynamic programming algorithms […]
Apr, 22
GPU-based Implementation of 128-bit Secure Eta Pairing Over a Binary Field
Eta pairing on a supersingular elliptic curve over the binary field F_2_1223 used to offer 128-bit security, and has been studied extensively for efficient implementations. In this paper, we report our GPU-based implementations of this algorithm on an NVIDIA Tesla C2050 platform. We propose efficient parallel implementation strategies for multiplication, square, square root and inverse […]
Apr, 22
Automatic Parallelization of a Gap Model using Java and OpenCL
Nowadays, scientists are often disappointed by the outcome when parallelizing their simulations, in spite of all the tools at their disposal. They often invest much time and money, and do not obtain the expected speed-up. This can come from many factors going from a wrong parallel architecture choice to a model that simply does not […]
Apr, 22
On the Efficacy of GPU-Integrated MPI for Scientific Applications
Scientific computing applications are quickly adapting to leverage the massive parallelism of GPUs in large-scale clusters. However, the current hybrid programming models require application developers to explicitly manage the disjointed host and GPU memories, thus reducing both efficiency and productivity. Consequently, GPU-integrated MPI solutions, such as MPI-ACC and MVAPICH2-GPU, have been developed that provide unified […]
Apr, 22
A General-Purpose GPU Reservoir Computer
The reservoir computer comprises a reservoir of possibly non-linear, possibly chaotic dynamics. By perturbing and taking outputs from this reservoir, its dynamics may be harnessed to compute complex problems at "the edge of chaos". One of the first forms of reservoir computer, the Echo State Network (ESN), is a form of artificial neural network that […]
Apr, 22
Programming Models and Runtimes for Heterogeneous Systems
With the plateauing of processor frequencies and increase in energy consumption in computing, application developers are seeking new sources of performance acceleration. Heterogeneous platforms with multiple processor architectures offer one possible avenue to address these challenges. However, modern heterogeneous programming models tend to be either so low-level as to severely hinder programmer productivity, or so […]
Apr, 22
Connecting Architecture, Fitness, Optimizations and Performance using an Anisotropic Diffusion Filter
Over the past decade, computing architectures have continued to exploit multiple levels of parallelism in applications. This increased interest in parallel computing has not only fueled the growth of multi-core processors but has also lead to an emergence of several non-traditional computing architectures like General Purpose Graphical Processing Units (GP-GPUs), Cell Processors, and Field Programmable […]
Apr, 22
Valar: A Benchmark Suite to Study the Dynamic Behavior of Heterogeneous Systems
Heterogeneous systems have grown in popularity within the commercial platform and application developer communities. We have seen a growing number of systems incorporating CPUs, Graphics Processors (GPUs) and Accelerated Processing Units (APUs combine a CPU and GPU on the same chip). These emerging class of platforms are now being targeted to accelerate applications where the […]