Posts
Mar, 8
Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)
Due to its object based nature, flexible features and provision for user interaction, MPEG-4 encoder is highly suitable for parallelization. The most critical and time-consuming operation of encoder is motion estimation. Nvidia’s general-purpose graphical processing unit (GPGPU) architecture allows for a massively parallel stream processor model at a very cheap price (in a few thousands […]
Mar, 7
IP routing processing with graphic processors
Throughput and programmability have always been the central, but generally conflicting concerns for modern IP router designs. Current high performance routers depend on proprietary hardware solutions, which make it difficult to adapt to ever-changing network protocols. On the other hand, software routers offer the best flexibility and programmability, but could only achieve a throughput one […]
Mar, 7
Application-guided tool development for architecturally diverse computation
Architecturally diverse computation exploits non-traditional computing platforms (e.g., field-programmable gate arrays, graphics processors, heterogeneous chip multiprocessors) to execute user applications. We have designed the Auto-Pipe tool set with the goal of easing the task of developing applications for architecturally diverse systems. Prior to and during the course of Auto-Pipe’s design, we have developed a number […]
Mar, 7
Non-blocking programming on multi-core graphics processors: (extended asbtract)
This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models […]
Mar, 7
CUDA-based AES parallelization with fine-tuned GPU memory utilization
Current Graphics Processing Unit (GPU) presents large potentials in speeding up computationally intensive data parallel applications over traditional parallelization approaches since there are much more hardware threads inside GPUs than the computational cores available to common CPU threads. NVIDIA developed a generic GPU programming platform, CUDA, which allows programmers to utilize GPU through C programming […]
Mar, 7
Designing scalable many-core parallel algorithms for min graphs using CUDA
Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and efficient parallel algorithms for multiple many-core GPU devices using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. […]
Mar, 7
A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA
The Viterbi algorithm is the compute-intensive kernel in Hidden Markov Model (HMM) based sequence alignment applications. In this paper, we investigate extending several parallel methods, such as the wave-front and streaming methods for the Smith-Waterman algorithm, to achieve a significant speed-up on a GPU. The wave-front method can take advantage of the computing power of […]
Mar, 7
Efficient parallel algorithms for maximum-density segment problem
One of the fundamental problems involving DNA sequences is to find high density segments of certain widths, for example, those regions with intensive guanine and cytosine (GC). Formally, given a sequence, each element of which has a value and a width, the maximum-density segment problem asks for the segment with the maximum density while satisfying […]
Mar, 7
Fast implementation of Wyner-Ziv Video codec using GPGPU
In this paper, we report a fast implementation of Wyner-Ziv video decoder using general-purpose computing on graphics processing units (GPGPU). Despite of its many advantages, Wyner-Ziv video coding has a problem of huge decoding complexity. Since Slepian-Wolf decoding with rate adaptive LDPC accumulate code takes up more than 90% of entire Wyner-Ziv video decoding complexity, […]
Mar, 7
Object-oriented stream programming using aspects
High-performance parallel programs that efficiently utilize heterogeneous CPU+GPU accelerator systems require tuned coordination among multiple program units. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this […]
Mar, 7
Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms
The move to massively parallel hybrid platforms, such as multicore CPUs accelerated with heterogeneous GPU co-processing systems, is significantly impacting software programmers because existing programs have to be properly parallelized before they can take advantage of these advanced processing architectures. However, using current programming frameworks such as CUDA leads to tangled source code that combines […]
Mar, 6
Statistical constraints on binary black hole inspiral dynamics
We perform a statistical analysis of the binary black hole problem in the post-Newtonian approximation by systematically sampling and evolving the parameter space of initial configurations for quasi-circular inspirals. Through a principal component analysis of spin and orbital angular momentum variables we systematically look for uncorrelated quantities and find three of them which are highly […]