Posts
Mar, 6
Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms
The maximum flow problem is a fundamental graph theory problem with many important applications. Max-flow algorithms based on the push-relabel method are known to have better complexity bound and faster practical execution speed than others. However, existing push-relabel algorithms are designed for uniprocessors or parallel processors that support locking primitives, thus making it very difficult […]
Mar, 6
Design and implementation of MPEG audio layer III decoder using graphics processing units
This paper describes a new implemented method for the MPEG audio layer III (MP3) decoder. The proposed architecture is based on a graphic process unit (GPU) using CUDA environment, where it can effectively take advantage of modern GPU’s parallel computing power. The implemented system with this architecture employs a multi-thread model and memory optimization to […]
Mar, 6
Performance study of mapping irregular computations on GPUs
Recently, Graphical Processing Units (GPUs) have become increasingly more capable and well-suited to general purpose applications. As a result of the GPUs high degree of parallelism and computational power, there has been a great deal of interest directed toward the platform for parallel application development. Much of the focus, however, has been on very regular […]
Mar, 6
Study on GPU-accelerated extraction of interconnects parasitic using CUDA and MPI
Parallel computation is application-oriented, particularly for the GPU (Graphics Processing Unit) with the inherent parallelism. This paper shows the architecture of a GPU cluster based on MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture). Results show that the acceleration ratio is obviously improved but the acceleration effect seems decelerated in large-scale GPU cluster. […]
Mar, 6
Tuned and asynchronous stencil kernels for CPU/GPU systems (thesis)
We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi’s iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider […]
Mar, 6
Speculative Execution on Multi-GPU Systems
The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to […]
Mar, 6
Automatic Generation of Multicore Chemical Kernels
This work presents the Kinetics Preprocessor: Accelerated (KPPA), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multicore platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRFChem and the Community Multiscale Air […]
Mar, 6
Task management for irregular-parallel workloads on the GPU
We explore software mechanisms for managing irregular tasks on graphics processing units (GPUs). We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We experiment with several task-management techniques, ranging from the use of a single monolithic task queue to distributed queuing with task stealing and […]
Mar, 6
Dynamic load balancing on single- and multi-GPU systems
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. […]
Mar, 5
Multimodal Image Registration Using GPU Parallel Computing Technology
This research project studies the parallel computing technique offered by the graphics processing unit (GPU), and uses it to accelerate the computation of image registration. Image registration is a process that aligns two images so that the point in one image corresponds to the same anatomical point in the other. It is a key part […]
Mar, 5
High-performance GPU based Rendering for Real-Time, rigid 2D/3D-Image Registration in Radiation Oncology
This thesis presents a comparison of high-speed rendering algorithms for the application in 2D/3D-image registration in radiation oncology. Image guided radiation therapy (IGRT) is a technique for improving the treatment of cancer with ionizing radiation by adapting the treatment plan to the current situation using 2D/3D-image registration. To accelerate this procedure, also rendering of Digitally […]
Mar, 5
Phase Based Volume Registration on the GPU with Application to Quantitative MRI
We present a method for fast phase based registration of volume data for medical applications. As the number of different modalities within medical imaging increases, it becomes more and more important with registration that works for a mixture of modalities. For these applications the phase based registration approach has proven to be superior. Today there […]