Posts
Jun, 30
Intra-Application Data-Communication Characterization
The growing demand of processing power is being satisfied mainly by an increase in the number of computing cores in a system. One of the main challenges to be addressed is efficient utilization of these architectures. This demands data-communication aware mapping of applications on these architectures. Appropriate tools are required to provide the detailed intra-application […]
Jun, 30
Compiling High Performance Recursive Filters
Infinite impulse response (IIR) or recursive filters, are essential for image processing because they turn expensive large-footprint convolutions into operations that have a constant cost per pixel regardless of kernel size. However, their recursive nature constrains the order in which pixels can be computed, severely limiting both parallelism within a filter and memory locality across […]
Jun, 30
Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs
Sequence alignment lies at heart of the bioinformatics.The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to […]
Jun, 26
Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation
Convolutional Neural Networks (CNNs) can be shifted across 2D images or 3D videos to segment them. They have a fixed input size and typically perceive only small local contexts of the pixels to be classified as foreground or background. In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in […]
Jun, 26
Concurrent Solutions to Linear Systems using Hybrid CPU/GPU Nodes
We investigate the parallel solutions to linear systems with the application focus as the global illumination problem in computer graphics. An existing CPU serial implementation using the radiosity method is given as the performance baseline where a scene and corresponding form-factor coefficients are provided. The initial computational radiosity solver uses the basic Jacobi method with […]
Jun, 26
Composability of parallel codes on heterogeneous architectures
To face the ever demanding requirements in term of accuracy and speed of scientific simulations, the High Performance community is constantly increasing the demands in term of parallelism, adding thus tremendous value to parallel libraries strongly optimized for highly complex architectures.Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great […]
Jun, 26
Block Time Step Storage Scheme for Astrophysical N-body Simulations
Astrophysical research in recent decades has made significant progress thanks to the availability of various N-body simulation techniques. With the rapid development of high-performance computing technologies, modern simulations have been able to take the computing power of massively parallel clusters with more than 10^5 GPU cores. While unprecedented accuracy and dynamical scales have been achieved, […]
Jun, 26
Ebb: A DSL for Physical Simluation on CPUs and GPUs
Designing programming environments for physical simulation is challenging because simulations rely on diverse algorithms and geometric domains. These challenges are compounded when we try to run efficiently on heterogeneous parallel architectures. We present Ebb, a domain-specific language (DSL) for simulation, that runs efficiently on both CPUs and GPUs. Unlike previous DSLs, Ebb uses a three-layer […]
Jun, 24
Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation
In this work we present a multi-level parallel framework for the Optical Flow computation on a GPUs cluster, equipped with a scientific computing middleware (the PetSc library). Starting from a flow-driven isotropic method, which models the optical flow problem through a parabolic partial differential equation (PDE), we have designed a parallel algorithm and its software […]
Jun, 24
Alpha-Beta Divergences Discover Micro and Macro Structures in Data
Although recent work in non-linear dimensionality reduction investigates multiple choices of divergence measure during optimization (Yang et al., 2013; Bunte et al., 2012), little work discusses the direct effects that divergence measures have on visualization. We study this relationship, theoretically and through an empirical analysis over 10 datasets. Our works shows how the alpha and […]
Jun, 24
Toward GPU Accelerated Data Stream Processing
In recent years, the need for continuous processing and analysis of data streams has increased rapidly. To achieve high throughput-rates, stream-applications make use of operator-parallelization, batching-strategies and distribution. Another possibility is to utilize co-processors capabilities per operator. Further, the database community noticed, that a column-oriented architecture is essential for efficient co-processing, since the data transfer […]
Jun, 24
DCT-JPEG Image Coding Based on GPU
In this paper, the parallel algorithm of JPEG coding based on GPU is proposed, most image compression systems have efficiency problem and the real-time of wireless multimedia sensor networks (WMSN) which used in image compression and transmission is also an issue need to be solved, so in this paper parallel computation is used in JPEG […]