Posts
Mar, 10
GPU-Accelerated Large-Eddy Simulation of Turbulent Channel Flows
High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to […]
Mar, 10
Multi-Object Geodesic Active Contours (MOGAC): A Parallel Sparse-Field Algorithm for Image Segmentation
An important task for computer vision systems is to segment adjacent structures in images without producing gaps or overlaps. Multi-object Level Set Methods (MLSM) perform this task with the benefit of sub-pixel accuracy. However, current implementations of MLSM are not as computationally or memory efficient as their region growing and graph cut counterparts which lack […]
Mar, 9
Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA
For video coding, weighing the balance between and coding rate image quality, we apply global motion search algorithm to avoid loss of image quality and parallel computing capacity of graphics processors to accelerate the encoding process. According to the heterogeneous system of CPU+GPU, and the multi-threaded parallel structure, thread synchronization features of CUDA platform, we […]
Mar, 9
A Study of CUDA Acceleration and Impact of Data Transfer Overhead in Heterogeneous Environment
Along with the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as energy, bioinformatics, finance and several research areas. With a wide range of data sizes where the CPU has greater performance, it would be important that CUDA enabled programs properly select when to and not to […]
Mar, 9
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions […]
Mar, 9
Relational Algorithms for Multi-Bulk-Synchronous Processors
Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, […]
Mar, 9
NLSEmagic: Nonlinear Schrodinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes
We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schroedinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and […]
Mar, 8
Scalable framework for mapping streaming applications onto multi-GPU systems
Graphics processing units leverage on a large array of parallel processing cores to boost the performance of a specific streaming computation pattern frequently found in graphics applications. Unfortunately, while many other general purpose applications do exhibit the required streaming behavior, they also possess unfavorable data layout and poor computation-to-communication ratios that penalize any straight-forward execution […]
Mar, 8
Dynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging
For medical imaging applications, a timely execution of tasks is essential. Hence, running multiple applications on the same system, scheduling with the capability of task preemption and prioritization becomes mandatory. Using GPUs as accelerators in this domain, imposes new challenges since GPU’s common FIFO scheduling does not support task prioritization and preemption. As a remedy, […]
Mar, 8
Compiler Assisted Runtime Adaptation
In this dissertation, we address the problem of runtime adaptation of the application to its execution environment. A typical example is changing theprocessing element on which a computation is executed, considering the available processing elements in the system. This is done based on the information and instrumentation provided by the compiler and taking into account […]
Mar, 8
Paragon: Collaborative Speculative Loop Execution on GPU and CPU
The rise of graphics engines as one of the main parallel platforms for general purpose computing has ignited a wide search for better programming support for GPUs. Due to their non-traditional execution model, developing applications for GPUs is usually very challenging, and as a result, these devices are left under-utilized in many commodity systems. Several […]
Mar, 8
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
Due to the wide variety of current and next-generation supercomputing architectures, the development of highperformance parallel visualization and analysis operators frequently requires re-writing the underlying algorithms for many different platforms. In order to facilitate portability, we have devised a framework for creating such operators that employs the data-parallel programming model. By writing the operators using […]