3246

Posts

Mar, 7

Fast implementation of Wyner-Ziv Video codec using GPGPU

In this paper, we report a fast implementation of Wyner-Ziv video decoder using general-purpose computing on graphics processing units (GPGPU). Despite of its many advantages, Wyner-Ziv video coding has a problem of huge decoding complexity. Since Slepian-Wolf decoding with rate adaptive LDPC accumulate code takes up more than 90% of entire Wyner-Ziv video decoding complexity, […]
Mar, 7

Object-oriented stream programming using aspects

High-performance parallel programs that efficiently utilize heterogeneous CPU+GPU accelerator systems require tuned coordination among multiple program units. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this […]
Mar, 7

Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms

The move to massively parallel hybrid platforms, such as multicore CPUs accelerated with heterogeneous GPU co-processing systems, is significantly impacting software programmers because existing programs have to be properly parallelized before they can take advantage of these advanced processing architectures. However, using current programming frameworks such as CUDA leads to tangled source code that combines […]
Mar, 6

Statistical constraints on binary black hole inspiral dynamics

We perform a statistical analysis of the binary black hole problem in the post-Newtonian approximation by systematically sampling and evolving the parameter space of initial configurations for quasi-circular inspirals. Through a principal component analysis of spin and orbital angular momentum variables we systematically look for uncorrelated quantities and find three of them which are highly […]
Mar, 6

Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms

The maximum flow problem is a fundamental graph theory problem with many important applications. Max-flow algorithms based on the push-relabel method are known to have better complexity bound and faster practical execution speed than others. However, existing push-relabel algorithms are designed for uniprocessors or parallel processors that support locking primitives, thus making it very difficult […]
Mar, 6

Design and implementation of MPEG audio layer III decoder using graphics processing units

This paper describes a new implemented method for the MPEG audio layer III (MP3) decoder. The proposed architecture is based on a graphic process unit (GPU) using CUDA environment, where it can effectively take advantage of modern GPU’s parallel computing power. The implemented system with this architecture employs a multi-thread model and memory optimization to […]
Mar, 6

Performance study of mapping irregular computations on GPUs

Recently, Graphical Processing Units (GPUs) have become increasingly more capable and well-suited to general purpose applications. As a result of the GPUs high degree of parallelism and computational power, there has been a great deal of interest directed toward the platform for parallel application development. Much of the focus, however, has been on very regular […]
Mar, 6

Study on GPU-accelerated extraction of interconnects parasitic using CUDA and MPI

Parallel computation is application-oriented, particularly for the GPU (Graphics Processing Unit) with the inherent parallelism. This paper shows the architecture of a GPU cluster based on MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture). Results show that the acceleration ratio is obviously improved but the acceleration effect seems decelerated in large-scale GPU cluster. […]
Mar, 6

Tuned and asynchronous stencil kernels for CPU/GPU systems (thesis)

We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi’s iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider […]
Mar, 6

Speculative Execution on Multi-GPU Systems

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to […]
Mar, 6

Automatic Generation of Multicore Chemical Kernels

This work presents the Kinetics Preprocessor: Accelerated (KPPA), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multicore platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRFChem and the Community Multiscale Air […]
Mar, 6

Task management for irregular-parallel workloads on the GPU

We explore software mechanisms for managing irregular tasks on graphics processing units (GPUs). We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We experiment with several task-management techniques, ranging from the use of a single monolithic task queue to distributed queuing with task stealing and […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: