All-pairs shortest-paths for large graphs on the GPU

The all-pairs shortest-path problem is an intricate part in numerous practical applications. We describe a shared memory cache efficient GPU implementation to solve transitive closure and the all-pairs shortest-path problem on directed graphs for large datasets. The proposed algorithmic design utilizes the resources available on the NVIDIA G80 GPU architecture using the CUDA API. Our […]
An integrated GPU power and performance model

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as […]
GPU as a General Purpose Computing Resource

In the last few years, GPUs(Graphics Processing Units) have made rapid development. Their ever-increasing computing power and decreasing cost have attracted attention from both industry and academia. In addition to graphics applications, researchers are interested in using them for general purpose computing. Recently, NVIDIA released a new computing architecture, CUDA (Compute Uni¿ed Device Architecture), for […]
Parallelization of cellular neural networks on GPU

Recently, cellular neural networks (CNNs) have been demonstrated to be a highly effective paradigm applicable in a wide range of areas. Typically, CNNs can be implemented using VLSI circuits, but this would unavoidably require additional hardware. On the other hand, we can also implement CNNs purely by software; this, however, would result in very low […]
Fast and scalable list ranking on the GPU

General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. The GPUs have been used extensively on regular problems that can be easily parallelized. In this paper, we describe two implementations of List Ranking, a […]
GPU-Based FFT Computation for Multi-Gigabit WirelessHD Baseband Processing

The next generation Graphics Processing Units (GPUs) are being considered for non-graphics applications. Millimeter wave (60 Ghz) wireless networks that are capable of multi-gigabit per second (Gbps) transfer rates require a significant baseband throughput. In this work, we consider the baseband of WirelessHD, a 60 GHz communications system, which can provide a data rate of […]
Real-time mesh simplification using the GPU

Recent advances in real-time rendering have allowed the GPU implementation of traditionally CPU-restricted algorithms, often with performance increases of an order of magnitude or greater. Such gains are achieved by leveraging the large-scale parallelism of the GPU towards applications that are well-suited for these streaming architectures. By contrast, mesh simplification has traditionally been viewed as […]
Cache and bandwidth aware matrix multiplication on the GPU

Recent advances in the speed and programmability of consumer level graphics hardware has sparked a flurry of research that goes beyond the realm of image synthesis and computer graphics. We examine the use of the GPU (graphics processing unit) as a tool for scientific computing, by analyzing techniques for performing large matrix multiplies in GPU […]
The FFT on a GPU

The Fourier transform is a well known and widely used tool in many scientific and engineering fields. The Fourier transform is essential for many image processing techniques, including filtering, manipulation, correction, and compression. As such, the computer graphics community could benefit greatly from such a tool if it were part of the graphics pipeline. As […]
MinGPU: a minimum GPU library for computer vision

Abstract  In the field of computer vision, it is becoming increasingly popular to implement algorithms, in sections or in their entirety, on a graphics processing unit (GPU). This is due to the superior speed GPUs offer compared to CPUs. In this paper, we present a GPU library, MinGPU, which contains all of the necessary functions to […]
Oct-tree Method on GPU

The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity of these problems are reduced from O(N^2) with a brute force method to O(N log N) with the tree method where […]
GPU-based intrinsic collision detection for deformable surfaces

An intrinsic collision detection unit (ICDU) forms the bottom-most layer of a collision detection pipeline. The ICDU performs collision detection and computes collision information for primitive feature pairs of objects in a 3D dynamic environment. A significant amount of time can be spent by the ICDU during the collision detection process. In this paper, we […]
