2672

Posts

Jan, 19

Relational query coprocessing on graphics processors

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs allow writes to random memory locations, provide efficient interprocessor communication through on-chip local memory, and support a general purpose parallel programming […]
Jan, 18

Secure 3D graphics for virtual machines

In this paper a new approach to API remoting for GPU virtualisation is described which aims to reduce the amount of trusted code involved in 3D rendering for guest VMs. To achieve this it uses a modular driver framework to export large proportions of complex 3D graphics drivers into the guest’s domain. It further provides […]
Jan, 18

Message passing for GPGPU clusters: CudaMPI

We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an MPI-like message passing interface to communicate data stored on the graphics cards of a distributed-memory parallel computer. These libraries can help applications that perform general purpose computations on these networked GPU clusters. We explore how to efficiently support both point-to-point and […]
Jan, 18

Blasting through lattice calculations using CUDA

Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia’s CUDA platform, in particular, offers direct access to graphics hardware through a programming language similar to C. Using the CUDA platform […]
Jan, 18

An exploration of CUDA and CBEA for a gravitational wave data-analysis application (Einstein@Home)

We present a detailed approach for making use of two new computer hardware architectures — CBEA and CUDA — for accelerating a scientific data-analysis application (Einstein@Home). Our results suggest that both the architectures suit the application quite well and the achievable performance in the same software developmental time-frame, is nearly identical.
Jan, 18

A novel approach for implementing Steganography with computing power obtained by combining Cuda and Matlab

With the current development of multiprocessor systems, strive for computing data on such processor have also increased exponentially. If the multi core processors are not fully utilized, then even though we have the computing power the speed is not available to the end users for their respective applications. In accordance to this, the users or […]
Jan, 18

A Multi-Stage CUDA Kernel for Floyd-Warshall

We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction […]
Jan, 18

Fast GPGPU Data Rearrangement Kernels using CUDA

Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging […]
Jan, 18

Parallelization of Weighted Sequence Comparison by using EBWT

The Extended Burrows Wheeler transform (EBWT) helps to find the distance between two sequences. Implementation of an existing algorithm takes considerable amount of time for small size sequences. In this paper, we give a parallel implementation of this algorithm using NVIDIA Compute Unified Device Architecture (CUDA). We have obtained, on an average, a 2X improvement […]
Jan, 18

GPGPUs in computational finance: Massive parallel computing for American style options

The pricing of American style and multiple exercise options is a very challenging problem in mathematical finance. One usually employs a Least-Square Monte Carlo approach (Longstaff-Schwartz method) for the evaluation of conditional expectations which arise in the Backward Dynamic Programming principle for such optimal stopping or stochastic control problems in a Markovian framework. Unfortunately, these […]
Jan, 18

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms […]
Jan, 17

Automatic bi-layer video segmentation based on sensor fusion

We propose a new solution to the problem of bi-layer video segmentation in terms of both, hardware design and algorithmic solution. At the data acquisition stage, we combine color video with infrared video, which is robust to illumination changes and provides an automatic initialization of the cue map for foreground-background segmentation. Two algorithms are presented […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: