Posts
May, 31
Distributed time, conservative parallel logic simulation on GPUs
Logical simulation is the primary method to verify the correctness of IC designs. However, today’s complex VLSI designs pose ever higher demand for the throughput of logic simulators. In this work, a parallel logic simulator was developed by leveraging the computing power of modern graphics processing units (GPUs). To expose more parallelism, we implemented a […]
May, 31
Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs
This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporation’s Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently […]
May, 31
Connected component identification and cluster update on GPU
Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as […]
May, 30
Parallel ant colony for nonlinear function optimization with graphics hardware acceleration
This paper presents a massively parallel ant colony optimization – pattern search (ACO-PS) algorithm with graphics hardware acceleration on nonlinear function optimization problems. The objective of this study is to determine the effectiveness of using graphics processing units (GPU) as a hardware platform for ACO-PS. GPU, the common graphics hardware found in modern personal computers, […]
May, 30
A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality
We present a development environment for distributed GPU computing targeted for multi-GPU systems, as well as graphics clusters. Our system is based on CUDA and logically extends its parallel programming model for graphics processors to higher levels of parallelism, namely, the PCI bus and network interconnects. While the extended API mimics the full function set […]
May, 30
A Micro-benchmark Suite for AMD GPUs
Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on […]
May, 30
Real-Time Rendering and Manipulation of Large Terrains
Terrains are challenging geometric objects for real-time rendering and interactive manipulation. State-of-the-art terrain rendering systems use custom, multi-resolution, representations like geometry clipmaps for fast rendering on the GPU. In this paper, we present a system that exploits the power and flexibility of the modern GPUs to store, render, and manipulate terrains with minimal CPU involvement. […]
May, 30
Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing
Markov clustering is becoming a key algorithm with in bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting […]
May, 30
Evaluating the potential of graphics processors for high performance embedded computing
Today’s high performance embedded computing applications are posing significant challenges for processing throughout. Traditionally, such applications have been realized on application specific integrated circuits (ASICs) and/or digital signal processors (DSP). However, ASICs’ advantage in performance and power often could not justify the fast increasing fabrication cost, while current DSP offers a limited processing throughput that […]
May, 30
High performance comparison-based sorting algorithm on many-core GPUs
Sorting is a kernel algorithm for a wide range of applications. In this paper, we present a new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our algorithm achieves high performance by efficiently mapping the sorting tasks to GPU […]
May, 30
VHF SAR image formation implemented on a GPU
This paper will describe how off-the-shelf 3D graphics cards can be used for scientific computation like SAR processing. In particular, a highly efficient one-dimensional FFT and a fast direct (global) backprojection implementation will be presented and analyzed.
May, 30
GPU-Accelerated Background Generation Algorithm with Low Latency
A background model is constructed to detect moving objects in video sequences. Subsequent frames are stacked on top of each other using associated registration information that must be obtained in a preprocessing step. The change of the brightness value in each pixel over time might be caused by a moving object. Current graphics processing units […]