Quantifying the Energy Efficiency of Object Recognition and Optical Flow

hgpu.org » Programming » Algorithms » Quantifying the Energy Efficiency of Object Recognition and Optical Flow

Quantifying the Energy Efficiency of Object Recognition and Optical Flow

Michael Anderson, Forrest Iandola, Kurt Keutzer

EECS Department, University of California, Berkeley

EECS Department, University of California, Berkeley, Technical Report No. UCB/EECS-2014-22, 2014

@techreport{Anderson:EECS-2014-22,

author={Anderson, Michael and Iandola, Forrest and Keutzer, Kurt},

title={Quantifying the Energy Efficiency of Object Recognition and Optical Flow},

institution={EECS Department, University of California, Berkeley},

year={2014},

month={Mar},

URL={http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-22.html},

Number={UCB/EECS-2014-22}

}

Download (PDF)

View

Source

2167

views

In this report, we analyze the computational and performance aspects of current state-of-the-art object recognition and optical flow algorithms. First, we identify important algorithms for object recognition and optical flow, then we perform a pattern decomposition to identify key computations. We include profiles of the runtime and energy efficiency (GFLOPS/W) for our implementation of these applications on a commercial architecture. Finally, we include an analysis of memory-bandwidth boundedness for optical flow to identify opportunities for communication-avoiding algorithms. Our results were measured on an Intel i7-4770K (Haswell) reference platform. A five-layer convolutional neural network used for object classification achieves 0.70 GFLOPS/W, which is 21% of the theoretical compute bound for this Haswell processor. On the Horn-Schunck, Lucas-Kanade, and Brox optical flow methods our implementations achieve 0.0338, 0.0103, and 0.0203 GFLOPS/W respectively. Our implementation achieves 7.9% of the theoretical bandwidth bound, assuming no cross-iteration memory optimization, for Horn-Schunk optical flow using the Jacobi solver, and 9.7% of the bandwidth bound for the conjugate-gradient solver. To improve performance, we will focus first on increasing bandwidth utilization, then on doing cross-iteration memory optimizations such as blocking and tiling the Jacobi solver and employing communication-avoiding linear solvers. We also compare the runtime-accuracy tradeoffs for each optical flow method. We find that each method has distinct advantages over the other methods in terms of the runtime-accuracy tradeoff, so we will continue to develop and support all three methods in the future.

Tags: Algorithms, ATI, ATI Radeon HD 7990, Computer science, Neural networks, OpenCL, Optical flow

April 7, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org