high performance computing on graphics processing units: hgpu.org

Posts

Mar, 8

Porting of an Edge-Based CFD Solver to GPUs

Graphics processing units (GPUs) are increasingly becoming a mainstream platform for high performance computational fluid dynamics. This paper describes the porting of a substantial portion of FEFLO, an adaptive, edge-based finite element code for the solution of compressible and incompressible flow, to run on GPUs. The code is primarily written in Fortran 77 and has […]

CUDA

Mar, 8

Accelerating H.264 inter prediction in a GPU by using CUDA

H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of general purpose graphics processing units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the […]

CUDA

Mar, 8

Offloading Region Matching of Data Distribution Management with CUDA

Data distribution management (DDM) aims to reduce the transmission of irrelevant data between High Level Architecture (HLA) compliant simulators by taking their interesting regions into account (i.e. region matching). In a large-scale simulation, computation intensive region matching would have a direct impact on the simulation performance. To deal with the high computation cost of region […]

CUDA

Mar, 8

Preliminary implementation of VQ image coding using GPGPU

GPGPU (general purpose computing on graphic processing unit) attracts a great deal of attention, that is used for general-purpose computations like numerical calculations as well as graphic processing. In this paper, as an example of hierarchical clustering algorithms, we evaluate PNN (pairwise nearest neighbor) on GPUs by using CUDA (compute unified device architecture). We also […]

CUDA

Mar, 8

Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)

Due to its object based nature, flexible features and provision for user interaction, MPEG-4 encoder is highly suitable for parallelization. The most critical and time-consuming operation of encoder is motion estimation. Nvidia’s general-purpose graphical processing unit (GPGPU) architecture allows for a massively parallel stream processor model at a very cheap price (in a few thousands […]

CUDA

Mar, 7

IP routing processing with graphic processors

Throughput and programmability have always been the central, but generally conflicting concerns for modern IP router designs. Current high performance routers depend on proprietary hardware solutions, which make it difficult to adapt to ever-changing network protocols. On the other hand, software routers offer the best flexibility and programmability, but could only achieve a throughput one […]

CUDA

Mar, 7

Application-guided tool development for architecturally diverse computation

Architecturally diverse computation exploits non-traditional computing platforms (e.g., field-programmable gate arrays, graphics processors, heterogeneous chip multiprocessors) to execute user applications. We have designed the Auto-Pipe tool set with the goal of easing the task of developing applications for architecturally diverse systems. Prior to and during the course of Auto-Pipe’s design, we have developed a number […]

CUDA

Mar, 7

Non-blocking programming on multi-core graphics processors: (extended asbtract)

This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models […]

CUDA

Mar, 7

CUDA-based AES parallelization with fine-tuned GPU memory utilization

Current Graphics Processing Unit (GPU) presents large potentials in speeding up computationally intensive data parallel applications over traditional parallelization approaches since there are much more hardware threads inside GPUs than the computational cores available to common CPU threads. NVIDIA developed a generic GPU programming platform, CUDA, which allows programmers to utilize GPU through C programming […]

CUDA

Mar, 7

Designing scalable many-core parallel algorithms for min graphs using CUDA

Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and efficient parallel algorithms for multiple many-core GPU devices using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. […]

CUDA

Mar, 7

A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA

The Viterbi algorithm is the compute-intensive kernel in Hidden Markov Model (HMM) based sequence alignment applications. In this paper, we investigate extending several parallel methods, such as the wave-front and streaming methods for the Smith-Waterman algorithm, to achieve a significant speed-up on a GPU. The wave-front method can take advantage of the computing power of […]

CUDA

Mar, 7

Efficient parallel algorithms for maximum-density segment problem

One of the fundamental problems involving DNA sequences is to find high density segments of certain widths, for example, those regions with intensive guanine and cytosine (GC). Formally, given a sequence, each element of which has a value and a width, the maximum-density segment problem asks for the segment with the maximum density while satisfying […]

CUDA