Posts
Mar, 6
Speculative Execution on Multi-GPU Systems
The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to […]
Mar, 6
Automatic Generation of Multicore Chemical Kernels
This work presents the Kinetics Preprocessor: Accelerated (KPPA), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multicore platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRFChem and the Community Multiscale Air […]
Mar, 6
Task management for irregular-parallel workloads on the GPU
We explore software mechanisms for managing irregular tasks on graphics processing units (GPUs). We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We experiment with several task-management techniques, ranging from the use of a single monolithic task queue to distributed queuing with task stealing and […]
Mar, 6
Dynamic load balancing on single- and multi-GPU systems
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. […]
Mar, 5
Multimodal Image Registration Using GPU Parallel Computing Technology
This research project studies the parallel computing technique offered by the graphics processing unit (GPU), and uses it to accelerate the computation of image registration. Image registration is a process that aligns two images so that the point in one image corresponds to the same anatomical point in the other. It is a key part […]
Mar, 5
High-performance GPU based Rendering for Real-Time, rigid 2D/3D-Image Registration in Radiation Oncology
This thesis presents a comparison of high-speed rendering algorithms for the application in 2D/3D-image registration in radiation oncology. Image guided radiation therapy (IGRT) is a technique for improving the treatment of cancer with ionizing radiation by adapting the treatment plan to the current situation using 2D/3D-image registration. To accelerate this procedure, also rendering of Digitally […]
Mar, 5
Phase Based Volume Registration on the GPU with Application to Quantitative MRI
We present a method for fast phase based registration of volume data for medical applications. As the number of different modalities within medical imaging increases, it becomes more and more important with registration that works for a mixture of modalities. For these applications the phase based registration approach has proven to be superior. Today there […]
Mar, 5
Phase Based Volume Registration Using CUDA
We present a method for fast phase based registration of volume data for medical applications. As the number of different modalities within medical imaging increases, it becomes more and more important with registration that works for a mixture of modalities. For these applications the phase based registration approach has proven to be superior. Today there […]
Mar, 5
Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA
The Variable Preconditioned GVR (VPGCR) with mixed precision on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA) is numerically investigated. The convergence theorem of VPGCR is guaranteed that the residual equation for the preconditioned procedure can be solved in the range of single precision operation. The results of computations show that VPGCR with […]
Mar, 5
Data Mining Using Graphics Processing Units
During the last few years, Graphics Processing Units (GPU) have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks such as rendering of 3D scenarios but can also be used for general numeric and symbolic computation tasks such as simulation and optimization. As […]
Mar, 5
Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware
The discrete wavelet transform (DWT) has been extensively used for image compression and denoising in the areas of image processing and computer vision. However, the intensive computation of DWT due to its inherent multilevel data decomposition and reconstruction operations brings a bottleneck that drastically reduces its performance and implementations for real-time applications when facing large […]
Mar, 5
Parallel Computing: The Elephant in the Room
Over the past few years, there has been a shift towards multi-core processors, driven partially by physical limitations. Mistaken assumptions of how effective and useful parallel systems can be have also provided motivation for this change. In this paper, we seek to directly identify the barriers to parallel computation. The barriers are not, as conventional […]

