Posts
Apr, 19
Parallel Approaches for SWAMP Sequence Alignment
This document is a summary and overview of several approaches to implement the local sequence alignment algorithms known as SWAMP and SWAMP+ on commercially available hardware. Using a Smith-Waterman style of alignment, these parallel algorithms have several innovative extensions that take advantage of the ASC associative computing model while maintaining speed, accuracy, and producing a […]
Apr, 19
A Hybrid Analytical DRAM Performance Model
As process technology scales, the number of transistors that can fit in a unit area has increased exponentially. Processor throughput, memory storage, and memory throughput have all been increasing at an exponential pace. As such, DRAM has become an ever-tightening bottleneck for applications with irregular memory access patterns. Computer architects in industry sometimes use ad […]
Apr, 19
Extending the Scalability of Single Chip Stream Processors with On-chip Caches
As semiconductor scaling continues, more transistors can be put onto the same chip despite growing challenges in clock frequency scaling. Stream processor architectures can make effective use of these additional resources for appropriate applications. However, it is important that programmer effort be amortized across future generations of stream processor architectures. Current industry projections suggest a […]
Apr, 19
A Task-centric Memory Model for Scalable Accelerator Architectures
This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, […]
Apr, 19
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in today’s desktop and notebook computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into […]
Apr, 19
Dynamic detection of uniform and affine vectors in GPGPU computations
We present a hardware mechanism which dynamically detects uniform and affine vectors used in SPMD architecture such as Graphics Processing Units, to minimize pressure on the register file and reduce power consumption with minimal architectural modifications. A preliminary experimental analysis conducted with the Barra simulator shows that this optimization can benefit up to 34 % […]
Apr, 19
Parallel calculation of the median and order statistics on GPUs with application to robust regression
We present and compare various approaches to a classical selection problem on Graphics Processing Units (GPUs). The selection problem consists in selecting the $k$-th smallest element from an array of size $n$, called $k$-th order statistic. We focus on calculating the median of a sample, the $n/2$-th order statistic. We introduce a new method based […]
Apr, 19
A pseudospectral matrix method for time-dependent tensor fields on a spherical shell
We construct a pseudospectral method for the solution of time-dependent, non-linear partial differential equations on a three-dimensional spherical shell. The problem we address is the treatment of tensor fields on the sphere. As a test case we consider the evolution of a single black hole in numerical general relativity. A natural strategy would be the […]
Apr, 18
Building a Personal High Performance Computer with Heterogeneous Processors
Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In this paper, we present HPP-PHPC, a hybrid architecture of heterogeneous processors connected by non-coherent off-chip system bus. The performance of HPP-PHPC […]
Apr, 18
Practical Pre-stack Kirchhoff Time Migration of Seismic Processing on General Purpose GPU
In this paper, we introduced three prototypes of GPGPU solutions on NVidia GeForce8800GT for a practical Pre-stack Kirchhoff Time Migration program. We presented how to re-design and re-implement the original CPU code to efficiency GPU code. The prototypes are more than at most 7.2 times faster than its CPU version on Intel’s P4 3.0G.
Apr, 18
GPU detectors for interference cancellation in chaos-based CDMA communications
Multi-user detection is an effective technique to reduce the mutual interference between users in code division multiple access (CDMA) communications at the cost of a larger number of arithmetic operations. It is shown that multi-user detection can be efficiently computed on graphics processors using a GPGPU approach. Specifically, two GPU parallel interference cancellation detectors for […]
Apr, 18
Efficient characterizations of composite materials electrical properties based on GPU accelerated finite difference method
In this paper, a GPU accelerated three-dimensional finite difference method is presented as an efficient approach of performing fast parallel simulations of composite materials. Using a NVIDIA GeForce 9800 series GPGPU and with an optimized CUDA implementation, a considerable speed-up (>20) was observed for simulations of large size problems. Further performance improvements could be achieved […]