3D Recursive Gaussian IIR on GPU and FPGAs: A Case Study for Accelerating Bandwidth-Bounded Applications
Computer Science Department, University of California, Los Angeles
9th IEEE Symposium on Application Specific Processors (SASP 2011), 2011
@article{cong20113d,
title={3D Recursive Gaussian IIR on GPUs and FPGAs},
author={Cong, J. and Huang, M. and Zou, Y.},
year={2011}
}
GPU devices typically have a higher off-chip bandwidth than FPGA-based systems. Thus typically GPU should perform better for bandwidth-bounded massive parallel applications. In this paper we present our implementations of a 3D recursive Gaussian IIR on multicore CPU, many-core GPU and multi-FPGA platforms. Our baseline implementation on the CPU features the smallest arithmetic computation (2 MADDs per dimension). Since this application is clearly bandwidth bounded, we show that the difference on the memory subsystems on different platform requires different bandwidth optimization techniques. Our implementations on the GPU and FPGA platforms show a 26X and 33X speedup respectively over the optimized single-thread code on the CPU.
December 19, 2011 by hgpu