3069

Posts

Feb, 19

Programmability: Design Costs and Payoffs using AMD GPU Streaming Languages and Traditional Multi-Core Libraries

GPGPUs and multi-core processors have come to the forefront of interest in scientific computing. Graphics processors have become programmable, allowing exploitation of their large amounts of memory bandwidth and thread level parallelism in general purpose computing. This paper explores these two architectures, the languages used to program them, and the optimizations used to maximize performance […]
Feb, 19

Decoupled Access/Execute Metaprogramming for GPU-Accelerated Systems

We describe the evaluation of several implementations of a simple image processing filter on an NVIDIA GTX 280 card. Our experimental results show that performance depends significantly on low-level details such as data layout and iteration space mapping which complicate code development and maintenance. We propose extending a CUDA or OpenCL like model with decoupled […]
Feb, 19

Compiler Support for High-level GPU Programming

We design a high-level abstraction of CUDA, called hiCUDA, using compiler directives. It simplifies the tasks in porting sequential applications to NVIDIA GPUs. This paper focuses on the design and implementation of a source-to-source compiler that translates a hiCUDA program into an equivalent CUDA program, and shows that the performance of CUDA code generated by […]
Feb, 19

High Performance Relevance Vector Machine on GPUs

The Relevance Vector Machine (RVM) algorithm has been widely utilized in many applications, such as machine learning, image pattern recognition, and compressed sensing. However, the RVM algorithm is computationally expensive. We seek to accelerate the RVM algorithm computation for time sensitive applications by utilizing massively parallel accelerators such as GPUs. In this paper, the computation […]
Feb, 19

A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs

We present a general framework for GPU-based low-latency data transfer schemes that can be used for a variety of particle-mesh algorithms [8]. This framework allows to hide the latency of the data transfer between GPU-accelerated computing nodes by interleaving it with the kernel execution on the GPU. We discuss as an example the fully relativistic […]
Feb, 19

GPU Accelerated Scalable Parallel Random Number Generators

SPRNG (Scalable Parallel Random Number Generators) is widely used in computational science applications, particularly on parallel systems. The LFG and LCG are two frequently used random number generators in this library. In this paper, LFG and LCG are implemented on GPUs in CUDA. As a library for providing random number to GPU scientific applications, GASPRNG […]
Feb, 19

Faster File Matching using GPGPUs

We address the problem of file matching by modifying the MD6 algorithm that is best suited to take advantage of GPU computing. MD6 is a cryptographic hash function that is tree-based and highly parallelizable. When the message M is available initially, the hashing operations can be initiated at different starting points within the message and […]
Feb, 19

Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms

The Cauchy variant of the Reed-Solomon algorithm is implemented on accelerator platforms including GPGPU, FPGA, CellBE and ClearSpeed as well as on a x86 multi-core system. The sustained throughput performance and kernel rates are measured for a 5+3 Reed-Solomon schema. To compare the different technology platforms an efficiency is introduced and the platforms are categorized […]
Feb, 19

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation

This paper describes a flexible simulator for background Radio Frequency clutter developed at the Georgia Tech Research Institute, and how this simulation was accelerated with the use of nVidia GPUs using GPU VSIPL. The paper describes the mathematical basis for the simulation and how it can be used to simulate RF environments and scenarios; introduces […]
Feb, 18

Accelerating Image Feature Comparisons using CUDA on Commodity Hardware

Given multiple images of the same scene, image registration is the process of determining the correct transformation to bring the images into a common coordinate system-i.e., how the images fit together. Featurebased registration applies a transformation function to the input images before performing the correlation step. The result of that transformation, also called feature extraction, […]
Feb, 18

Tetrahedral Interpolation for Deformable Image Registration on GPUs

We speed up the tetrahedral interpolation step of a deformable image registration code called MORFEUS. We implement several versions of the interpolation code on a Fermi GPU (GTX480). Despite the irregularity of the code, we obtained kernel speedups of up to 24.6x, 33.7x and 62.4x on three real-life benchmarks. These numbers do not include the […]
Feb, 18

Optimization of HEP codes on GPUs

The graphics processor units (GPUs) have evolved into high-performance co-processors that can be easily programmed with common high-level language such as C, Fortran and C++. Today’s GPUs greatly outpace CPUs in arithmetic performance and memory bandwidth, making them the ideal coprocessor to accelerate a variety of data parallel applications. Here, we shall describe the application […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: