Posts
May, 21
A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
Sparse matrix-vector multiplication (SMVM) is a crucial primitive used in a variety of scientific and commercial applications. Despite having significant parallelism, SMVM is a challenging kernel to optimize due to its irregular memory access characteristics. Numerous studies have proposed the use of FPGAs to accelerate SMVM implementations. However, most prior approaches focus on parallelizing multiply-accumulate […]
May, 21
Developing a compiler for the XeonPhi
The XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler (VPC) to this architecture and assess its performance by comparisons of the […]
May, 21
A performance/cost evaluation for a GPU-based drug discovery application on volunteer computing
Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and thus the use of high-performance computing (HPC) platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs) have democratized the use of HPC as they push desktop computers to cluster-level […]
May, 21
A combined MPI-CUDA parallel solution of linear and nonlinear Poisson-Boltzmann equation
The Poisson-Boltzmann equation models the electrostatic potential generated by fixed charges on a polarizable solute immersed in an ionic solution. This approach is often used in computational Structural Biology to estimate the electrostatic energetic component of the assembly of molecular biological systems. In the last decades the amount of structural data concerning proteins and other […]
May, 21
2-D Impulse Noise Suppression by Recursive Gaussian Maximum Likelihood Estimation
An effective approach termed Recursive Gaussian Maximum Likelihood Estimation (RGMLE) is developed in this paper to suppress 2-D impulse noise. And two algorithms termed RGMLE-C and RGMLE-CS are derived by using spatially-adaptive variances, which are respectively estimated based on certainty and joint certainty & similarity information. To give reliable implementation of RGMLE-C and RGMLE-CS algorithms, […]
May, 21
A Comparison of Serial & Parallel Particle Filters for Time Series Analysis
This paper discusses the application of parallel programming techniques to the estimation of hidden Markov models via the use of a particle filter. It highlights how the Thrust parallel programming language can be used to implement a particle filter in parallel. The impact of a parallel particle filter on the running times of three different […]
May, 20
targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance
To achieve high performance on modern computers, it is vital to map algorithmic parallelism to that inherent in the hardware. From an application developer’s perspective, it is also important that code can be maintained in a portable manner across a range of hardware. Here we present targetDP, a lightweight programming layer that allows the abstraction […]
May, 20
Multi-GPU Accelerated Parallel Algorithm of Wallis Transformation for Image Enhancement
With the development of satellite remote sensing technology, satellite remote sensing data obtained by the amount will increase rapidly. Consequently, the process of Wallis transformation is faced with such challenges as large data size, high intensity, high computational complexity and large computational quantity, and so on. A fast algorithm and efficient implementation of Wallis filtering […]
May, 20
Exploiting Parallelism in GPUs
Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget. Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications […]
May, 20
Parallel Approaches to Edit Distance and Approximate String Matching
In this paper, we explore approaches to parallelizing the edit distance problem and the related approximate string matching problem. The edit distance is a measure of the number of individual character insertions, deletions, and substitutions requried to transform one string into another string. In the canonical dynamic programming solution to the edit distance, a chain […]
May, 20
A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU
Power and energy consumption are becoming an increasing concern in high performance computing. Compared to multi-core CPUs, GPUs have a much better performance per watt. In this paper we discuss efforts to redesign the most computation intensive parts of BLAST, an application that solves the equations for compressible hydrodynamics with high order finite elements, using […]
May, 18
An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems
We address the problem that multicore DSP system doesn’t support OpenCL programming. We designed compiler and proposed a runtime framework for TI multicore DSP, by which OpenCL parallel program could take advantage of multicore computing resource. Firstly, we make use of the LLVM and Clang compiler front-end to achieve source-to-source translation and in the next […]