3259

Posts

Mar, 8

Porting of an Edge-Based CFD Solver to GPUs

Graphics processing units (GPUs) are increasingly becoming a mainstream platform for high performance computational fluid dynamics. This paper describes the porting of a substantial portion of FEFLO, an adaptive, edge-based finite element code for the solution of compressible and incompressible flow, to run on GPUs. The code is primarily written in Fortran 77 and has […]
Mar, 8

Accelerating H.264 inter prediction in a GPU by using CUDA

H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of general purpose graphics processing units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the […]
Mar, 8

Offloading Region Matching of Data Distribution Management with CUDA

Data distribution management (DDM) aims to reduce the transmission of irrelevant data between High Level Architecture (HLA) compliant simulators by taking their interesting regions into account (i.e. region matching). In a large-scale simulation, computation intensive region matching would have a direct impact on the simulation performance. To deal with the high computation cost of region […]
Mar, 8

Preliminary implementation of VQ image coding using GPGPU

GPGPU (general purpose computing on graphic processing unit) attracts a great deal of attention, that is used for general-purpose computations like numerical calculations as well as graphic processing. In this paper, as an example of hierarchical clustering algorithms, we evaluate PNN (pairwise nearest neighbor) on GPUs by using CUDA (compute unified device architecture). We also […]
Mar, 8

Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)

Due to its object based nature, flexible features and provision for user interaction, MPEG-4 encoder is highly suitable for parallelization. The most critical and time-consuming operation of encoder is motion estimation. Nvidia’s general-purpose graphical processing unit (GPGPU) architecture allows for a massively parallel stream processor model at a very cheap price (in a few thousands […]
Mar, 7

IP routing processing with graphic processors

Throughput and programmability have always been the central, but generally conflicting concerns for modern IP router designs. Current high performance routers depend on proprietary hardware solutions, which make it difficult to adapt to ever-changing network protocols. On the other hand, software routers offer the best flexibility and programmability, but could only achieve a throughput one […]
Mar, 7

Application-guided tool development for architecturally diverse computation

Architecturally diverse computation exploits non-traditional computing platforms (e.g., field-programmable gate arrays, graphics processors, heterogeneous chip multiprocessors) to execute user applications. We have designed the Auto-Pipe tool set with the goal of easing the task of developing applications for architecturally diverse systems. Prior to and during the course of Auto-Pipe’s design, we have developed a number […]
Mar, 7

Non-blocking programming on multi-core graphics processors: (extended asbtract)

This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models […]
Mar, 7

CUDA-based AES parallelization with fine-tuned GPU memory utilization

Current Graphics Processing Unit (GPU) presents large potentials in speeding up computationally intensive data parallel applications over traditional parallelization approaches since there are much more hardware threads inside GPUs than the computational cores available to common CPU threads. NVIDIA developed a generic GPU programming platform, CUDA, which allows programmers to utilize GPU through C programming […]
Mar, 7

Designing scalable many-core parallel algorithms for min graphs using CUDA

Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and efficient parallel algorithms for multiple many-core GPU devices using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. […]
Mar, 7

A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA

The Viterbi algorithm is the compute-intensive kernel in Hidden Markov Model (HMM) based sequence alignment applications. In this paper, we investigate extending several parallel methods, such as the wave-front and streaming methods for the Smith-Waterman algorithm, to achieve a significant speed-up on a GPU. The wave-front method can take advantage of the computing power of […]
Mar, 7

Efficient parallel algorithms for maximum-density segment problem

One of the fundamental problems involving DNA sequences is to find high density segments of certain widths, for example, those regions with intensive guanine and cytosine (GC). Formally, given a sequence, each element of which has a value and a width, the maximum-density segment problem asks for the segment with the maximum density while satisfying […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: