Posts
Mar, 9
A Study of CUDA Acceleration and Impact of Data Transfer Overhead in Heterogeneous Environment
Along with the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as energy, bioinformatics, finance and several research areas. With a wide range of data sizes where the CPU has greater performance, it would be important that CUDA enabled programs properly select when to and not to […]
Mar, 9
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions […]
Mar, 9
Relational Algorithms for Multi-Bulk-Synchronous Processors
Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, […]
Mar, 9
NLSEmagic: Nonlinear Schrodinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes
We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schroedinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and […]
Mar, 8
Scalable framework for mapping streaming applications onto multi-GPU systems
Graphics processing units leverage on a large array of parallel processing cores to boost the performance of a specific streaming computation pattern frequently found in graphics applications. Unfortunately, while many other general purpose applications do exhibit the required streaming behavior, they also possess unfavorable data layout and poor computation-to-communication ratios that penalize any straight-forward execution […]
Mar, 8
Dynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging
For medical imaging applications, a timely execution of tasks is essential. Hence, running multiple applications on the same system, scheduling with the capability of task preemption and prioritization becomes mandatory. Using GPUs as accelerators in this domain, imposes new challenges since GPU’s common FIFO scheduling does not support task prioritization and preemption. As a remedy, […]
Mar, 8
Compiler Assisted Runtime Adaptation
In this dissertation, we address the problem of runtime adaptation of the application to its execution environment. A typical example is changing theprocessing element on which a computation is executed, considering the available processing elements in the system. This is done based on the information and instrumentation provided by the compiler and taking into account […]
Mar, 8
Paragon: Collaborative Speculative Loop Execution on GPU and CPU
The rise of graphics engines as one of the main parallel platforms for general purpose computing has ignited a wide search for better programming support for GPUs. Due to their non-traditional execution model, developing applications for GPUs is usually very challenging, and as a result, these devices are left under-utilized in many commodity systems. Several […]
Mar, 8
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
Due to the wide variety of current and next-generation supercomputing architectures, the development of highperformance parallel visualization and analysis operators frequently requires re-writing the underlying algorithms for many different platforms. In order to facilitate portability, we have devised a framework for creating such operators that employs the data-parallel programming model. By writing the operators using […]
Mar, 6
Efficient Relational Algebra Algorithms and Data Structures for GPU
Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, […]
Mar, 6
PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world’s fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense Technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability […]
Mar, 6
KUDA: GPU Accelerated Split Race Checker
We propose a novel approach for runtime verification on computers with a large number of computation cores, without any hardware extension to mainstream PC environment. The goal of the approach is making use of all hardware resources to decouple the computational overhead of traditional race checkers via parallelizing the runtime verification. We distinguish between two […]