Tradeoffs in designing accelerator architectures for visual computing

hgpu.org » Applications » Computer science » Computer vision » Tradeoffs in designing accelerator architectures for visual computing

Tradeoffs in designing accelerator architectures for visual computing

Aqeel Mahesri, Daniel Johnson, Neal Crago, Sanjay J. Patel

Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Champaign, IL

41st IEEE/ACM International Symposium on Microarchitecture, 2008. MICRO-41. 2008

DOI:10.1109/MICRO.2008.4771788

@inproceedings{mahesri2008tradeoffs,

title={Tradeoffs in designing accelerator architectures for visual computing},

author={Mahesri, A. and Johnson, D. and Crago, N. and Patel, S.J.},

booktitle={Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture},

pages={164–175},

year={2008},

organization={IEEE Computer Society}

}

Download (PDF)

View

Source

2602

views

Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer vision. These applications are ideally suited for accelerators because of their parallelizability and demand for high throughput. We compile a benchmark suite, VIS- Bench, to serve as a proxy for this application class. We use VISBench to examine some important high level decisions for an accelerator architecture. We propose a highly parallel base architecture. We examine the need for synchronization and data communication. We also examine GPU-style SIMD execution and find that a MIMD architecture usually performs better. Given these high level choices, we use VISBench to explore the microarchitectural design space. We analyze area versus performance tradeoffs in designing individual cores and the memory hierarchy. We find that a design made of small, simple cores achieves much higher throughput than a general purpose uniprocessor. Further, we find that a limited amount of support for ILP within each core aids overall performance. We find that fine-grained multithreading improves performance, but only up to a point. We find that word-level (SSE-style) SIMD provides a poor performance to area ratio. Finally, we find that sufficient memory and cache bandwidth is essential to performance.

Tags: Benchmarking, CMP, Computer science, Computer vision, Rendering, Video encoding, Visualization

August 4, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org