6436

Posts

Nov, 23

Building-Blocks for Performance Oriented DSLs

Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware […]
Nov, 23

TEG: GPU Performance Estimation Using a Timing Model

Modern Graphic Processing Units (GPUs) offer significant performance speedup over conventional processors. Programming on GPU for general purpose applications has become an important research area. CUDA programming model provides a C-like interface and is widely accepted. However, since hardware vendors do not disclose enough underlying architecture details, programmers have to optimize their applications without fully […]
Nov, 23

Accelerating the Rate of Astronomical Discovery with GPU-Powered Clusters

In recent years, the Graphics Processing Unit (GPU) has emerged as a low-cost alternative for high performance computing, enabling impressive speed-ups for a range of scientific computing applications. Early adopters in astronomy are already benefiting in adapting their codes to take advantage of the GPU’s massively parallel processing paradigm. I give an introduction to, and […]
Nov, 23

An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the […]
Nov, 22

Dynamic adaptation and distribution of binaries to heterogeneous architectures

Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are unable to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be […]
Nov, 22

Efficient Shallow Water Simulations on GPUs

For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed […]
Nov, 22

Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data

Heterogeneous systems often employ processing units with a wide spectrum of performance capabilities. Allowing individual applications to make greedy local scheduling decisions leads to imbalance, with underutilization of some devices and excessive contention for others. If we instead allow the system to make global scheduling decisions and assign some applications to a slower device, we […]
Nov, 22

Application of GPGPU for Acceleration of Short DNA Sequence Alignment in Unipro UGENE Project

A dramatic increase of available sequencing datasets has resulted in the need of fast sequence alignment methods. Plenty of novel methods were proposed to perform the fast alignment of NGS data and some of them appeared to be rather effective, however a relatively small number of existing alignment tools use Graphic Processing Units (GPUs) to […]
Nov, 22

Parallelizing Multicore Cache Simulations using Heterogeneous Computing on General Purpose and Graphics Processors

Traditional trace-driven memory system simulation is a very time consuming process while the advent of multi-cores simply exacerbates the problem. We propose a framework for accelerating trace-driven multi-core cache simulations by utilizing the capabilities of the modern many-core Graphic Processing Units (GPUs). A straightforward way towards this direction is to rely on the inherent parallelism […]
Nov, 22

GPU-based Multi-start Local Search Algorithms

In practice, combinatorial optimization problems are complex and computationally time-intensive. Local search algorithms are powerful heuristics which allow to significantly reduce the computation time cost of the solution exploration space. In these algorithms, the multi-start model may improve the quality and the robustness of the obtained solutions. However, solving large size and time-intensive optimization problems […]
Nov, 22

Using Graphics Processors for a High Performance Normalization of Gene Expressions

Ultra high density oligonucleotide micro arrays allow several millions of genetic markers in a single experiment to be observed. Current bioinformatics software for gene expression quantile data normalization is unable to process such huge datasets. In parallel with this perception, the huge volume of molecular data produced by current high-throughput technologies in modern molecular biology […]
Nov, 22

Dataflow-Based Implementation of Layered Sensing Applications

This report describes a new dataflow-based technology and associated design tools for high-productivity design, analysis, and optimization of layered sensing software for signal processing systems. Our approach provides novel capabilities, based on the principles of task-level dataflow analysis, for exploring and optimizing interactions across application behavior; operational context; high performance embedded processing platforms, and implementation […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: