Posts
Nov, 22
Dynamic adaptation and distribution of binaries to heterogeneous architectures
Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are unable to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be […]
Nov, 22
Efficient Shallow Water Simulations on GPUs
For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed […]
Nov, 22
Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data
Heterogeneous systems often employ processing units with a wide spectrum of performance capabilities. Allowing individual applications to make greedy local scheduling decisions leads to imbalance, with underutilization of some devices and excessive contention for others. If we instead allow the system to make global scheduling decisions and assign some applications to a slower device, we […]
Nov, 22
Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project
A dramatic increase of available sequencing datasets has resulted in the need of fast sequence alignment methods. Plenty of novel methods were proposed to perform the fast alignment of NGS data and some of them appeared to be rather effective, however a relatively small number of existing alignment tools use Graphic Processing Units (GPUs) to […]
Nov, 22
Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors
Traditional trace-driven memory system simulation is a very time consuming process while the advent of multi-cores simply exacerbates the problem. We propose a framework for accelerating trace-driven multi-core cache simulations by utilizing the capabilities of the modern many-core Graphic Processing Units (GPUs). A straightforward way towards this direction is to rely on the inherent parallelism […]
Nov, 22
GPU-based Multi-start Local Search Algorithms
In practice, combinatorial optimization problems are complex and computationally time-intensive. Local search algorithms are powerful heuristics which allow to significantly reduce the computation time cost of the solution exploration space. In these algorithms, the multi-start model may improve the quality and the robustness of the obtained solutions. However, solving large size and time-intensive optimization problems […]
Nov, 22
Using Graphics Processors for a High Performance Normalization of Gene Expressions
Ultra high density oligonucleotide micro arrays allow several millions of genetic markers in a single experiment to be observed. Current bioinformatics software for gene expression quantile data normalization is unable to process such huge datasets. In parallel with this perception, the huge volume of molecular data produced by current high-throughput technologies in modern molecular biology […]
Nov, 22
Dataflow-Based Implementation of Layered Sensing Applications
This report describes a new dataflow-based technology and associated design tools for high-productivity design, analysis, and optimization of layered sensing software for signal processing systems. Our approach provides novel capabilities, based on the principles of task-level dataflow analysis, for exploring and optimizing interactions across application behavior; operational context; high performance embedded processing platforms, and implementation […]
Nov, 22
Experiences with Achieving Portability across Heterogeneous Architectures
The increasing computational needs of parallel applications inevitably require portability across popular parallel architectures, which are becoming heterogeneous. The lack of a common parallel framework results in divergent code bases, difficulty in porting, higher maintenance cost, and, thus difficulty achieving optimal performance on target architectures. Our paper examines two representative parallel applications and describes code […]
Nov, 22
Superconducting proximity effect in graphene under inhomogeneous strain
The interplay between quantum Hall states and Cooper pairs is usually hindered by the suppression of the superconducting state due to the strong magnetic fields needed to observe the quantum Hall effect. From this point of view graphene is special since it allows the creation of strong pseudo-magnetic fields due to strain. We show that […]
Nov, 21
Online Adaptive Code Generation and Tuning
In this paper, we present a runtime compilation and tuning framework for parallel programs. We extend our prior work on our auto-tuner, Active Harmony, for tunable parameters that require code generation (for example, different unroll factors). For such parameters, our auto-tuner generates and compiles new code on-the-fly. Effectively, we merge traditional feedback directed optimization and […]
Nov, 21
Issues in Heterogenenous GPU Clusters
In this paper, we discuss networking issues arising in the design, analysis and use for scientific computing of clusters equipped with graphics processing units. The adoption of graphics accelerators in clusters used for high-performance scientific computing is a fairly recent phenomenon and promises to be an important trend now and into the foreseeable future. After […]