Posts
Nov, 3
Accelerating Inclusion-based Pointer Analysis on Heterogeneous CPU-GPU Systems
This paper describes the first implementation of Andersen’s inclusion-based pointer analysis for C programs on a heterogeneous CPU-GPU system, where both its CPU and GPU cores are used. As an important graph algorithm, Andersen’s analysis is difficult to parallelise because it makes extensive modifications to the structure of the underlying graph, in a way that […]
Nov, 3
N-Body Simulation Using GP-GPU: Evaluating Host/Device Memory Transference Overhead
N-Body simulation algorithms are amongst the most commonly used within the field of scientific computing. Especially in computational astrophysics, they are used to simulate gravitational scenarios for solar systems or galactic collisions. Parallel versions of such N-Body algorithms have been extensively designed and optimized for multicore and distributed computing schemes. However, N-Body algorithms are still […]
Nov, 2
Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU
Image registration is wildly used in the biomedical image, but there are too many textures and noises in the biomedical image to get a precise image registration. In order to get the excellent registration performance, it needs more complex image processing, and it will spend expensive computation cost. For the real time issue, this paper […]
Nov, 2
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems
For software to fully exploit the computing power of emerging heterogeneous computers, not only must the required computational kernels be optimized for the specific hardware architectures but also an effective scheduling scheme is needed to utilize the available heterogeneous computational units and to hide the communication between them. As a case study, we develop a […]
Nov, 2
Adjoint Algorithmic Differentiation of a GPU Accelerated Application
We consider a GPU accelerated program using Monte Carlo simulation to price a basket call option on 10 FX rates driven by a 10 factor local volatility model. We develop an adjoint version of this program using algorithmic differentiation. The code uses mixed precision. For our test problem of 10,000 sample paths with 360 Euler […]
Nov, 2
An MPI-CUDA Implementation for the Compression of DEM
A high performance terrain data compression method is proposed based on discrete wavelet transform (DWT) and parallel run-length code. But the implementation of the schemes to solve these models in realistic scenarios imposes huge demands of computing power. Compute Unified Device Architecture (CUDA) programmed, Graphic Processing Units (GPUs) are rapidly becoming a major choice in […]
Nov, 2
Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm
GPU parallelism for real applications can achieve enormous performance gain. CPU-GPU Communication is one of the major bottlenecks that limit this performance gain. Among several libraries developed so far to optimize this communication, DyManD (Dynamically Managed Data) provides better communication optimization strategies and achieves better performance on a single GPU. Smith-Waterman is a well known […]
Oct, 30
GPU Accelerated Blood Flow Computation using the Lattice Boltzmann Method
We propose a numerical implementation based on a Graphics Processing Unit (GPU) for the acceleration of the execution time of the Lattice Boltzmann Method (LBM). The study focuses on the application of the LBM for patient-specific blood flow computations, and hence, to obtain higher accuracy, double precision computations are employed. The LBM specific operations are […]
Oct, 30
GPU-Based Image Segmentation Using Level Set Method With Scaling Approach
In recent years, with the development of graphics processors, graphics cards have been widely used to perform general-purpose calculations. Especially with release of CUDA C programming languages in 2007, most of the researchers have been used CUDA C programming language for the processes which needs high performance computing. In this paper, a scaling approach for […]
Oct, 30
An Evolutionary Approach to Parallel Computing Using GPU
A few years, the programmable graphics processor unit has evolved into an absolute High performance computing. Simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. A compiler and run time system that abstracts and virtualizes many aspects of graphics hardware. Commodity graphics hardware has rapidly evolved from being a fixed-function pipeline […]
Oct, 30
The Plasma Simulation Code: A modern particle-in-cell code with load-balancing and GPU support
Recent increases in supercomputing power, driven by the multi-core revolution and accelerators such as the IBM Cell processor, graphics processing units (GPUs) and Intel’s Many Integrated Core (MIC) technology have enabled kinetic simulations of plasmas at unprecedented resolutions, but changing HPC architectures also come with challenges for writing efficient numerical codes. This paper describes the […]
Oct, 30
Analysis of Parallel Sorting Algorithms on Heterogeneous Processors with OpenCL
The heterogeneous computing platform with the tremendous raw capacity can be easily constructed with the availability of multi-core processors, high capacitive FPGAs and GPUs which can include any number of these computing units. However, challenge faced until now was the lack of a standardized framework under which the computational tasks and data of applications could […]