Posts
Nov, 23
Automated architecture-aware mapping of streaming applications onto GPUs
Graphic Processing Units (GPUs) are made up of many streaming multiprocessors, each consisting of processing cores that interleave the execution of a large number of threads. Groups of threads – called warps and wave fronts, respectively, in nVidia and AMD literature – are selected by the hardware scheduler and executed in lockstep on the available […]
Nov, 23
A Parallel Deconvolution Algorithm in Perfusion Imaging
In this paper, we will present the implementation of a deconvolution algorithm for brain perfusion quantification on GPGPU (General Purpose Graphics Processor Units) using the CUDA programming model. GPUs originated as graphics generation dedicated co-processors, but the modern GPUs have evolved to become a more general processor capable of executing scientific computations. It provides a […]
Nov, 23
Real-World Constraints of GPUs in Real-Time Systems
Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful coprocessors. In this paper, we explore possible applications for GPUs in real-time systems, discuss the limitations and constraints imposed by current GPU technology, and present a summary of our research addressing many […]
Nov, 23
Soren: Adaptive MapReduce for Programmable GPUs
In recent years the MapReduce programming model has been widely used for developing parallel data-intensive applications. As a result of its popularity, there exist many implementations of the MapReduce model on different parallel architectures including on massively parallel programmable GPUs. A basic challenge in implementing a MapReduce runtime system is the wide diversity of applications […]
Nov, 23
Towards solving the Table Maker’s Dilemma on GPU
Since 1985, the IEEE 754 standard defines formats, rounding modes and basic operations for floating-point arithmetic. In 2008 the standard has been extended, and recommendations have been added about the rounding of some elementary functions such as trigonometric functions (cosine, sine, tangent and their inverses), exponentials, and logarithms. However to guarantee the exact rounding of […]
Nov, 23
Accelerating Protein Sequence Search in a Heterogeneous Computing System
The "Basic Local Alignment Search Tool” (BLAST) is arguably the most widely used computational tool in bioinformatics. However, the computational power required for routine BLAST analysis has been outstripping Moore’s Law due to the exponential growth in the size of the genomic sequence databases that BLAST searches on. To address the above issue, we propose […]
Nov, 23
Building-Blocks for Performance Oriented DSLs
Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware […]
Nov, 23
TEG: GPU Performance Estimation Using a Timing Model
Modern Graphic Processing Units (GPUs) offer significant performance speedup over conventional processors. Programming on GPU for general purpose applications has become an important research area. CUDA programming model provides a C-like interface and is widely accepted. However, since hardware vendors do not disclose enough underlying architecture details, programmers have to optimize their applications without fully […]
Nov, 23
Accelerating the Rate of Astronomical Discovery with GPU-Powered Clusters
In recent years, the Graphics Processing Unit (GPU) has emerged as a low-cost alternative for high performance computing, enabling impressive speed-ups for a range of scientific computing applications. Early adopters in astronomy are already benefiting in adapting their codes to take advantage of the GPU’s massively parallel processing paradigm. I give an introduction to, and […]
Nov, 23
An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm
Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the […]
Nov, 22
Dynamic adaptation and distribution of binaries to heterogeneous architectures
Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are unable to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be […]
Nov, 22
Efficient Shallow Water Simulations on GPUs
For some classes of problems, NVIDIA CUDA abstraction and hardware properties combine with problem characteristics to limit the specific problem instances that can be effectively accelerated. As a real-world example, a twodimensional correlation-based template-matching MATLAB application is considered. While this problem has a well known solution for the common case of linear image filtering-small fixed […]