8799

Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU)

Hyesoon Kim, Richard Vuduc, Wen-mei Hwu, Sara Baghsorkhi
Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Morgan & Claypool Publishers, 2013
@article{kim2013performance,

   title={Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)},

   author={Kim, H. and Vuduc, R. and Baghsorkhi, S. and Choi, J. and Hwu, W.},

   journal={Synthesis Lectures on Computer Architecture},

   volume={7},

   number={2},

   pages={1–96},

   year={2013},

   publisher={Morgan & Claypool Publishers}

}

Download Download (PDF)   View View   Source Source   

1230

views

General-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread contexts vs. tens), a return to wide vector units (several tens vs. 1-10), memory architectures that deliver higher peak memory bandwidth (hundreds of gigabytes per second vs. tens), and smaller caches/scratchpad memories (less than 1 megabyte vs. 1-10 megabytes). In this book, we provide a high-level overview of current GPGPU architectures and programming models. We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms. We aim to provide hints to architects about understanding algorithm aspect to GPGPU. We also provide detailed performance analysis and guide optimizations from high-level algorithms to low-level instruction level optimizations. As a case study, we use n-body particle simulations known as the fast multipole method (FMM) as an example. We also briefly survey the state-of-the-art in GPU performance analysis tools and techniques.
VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU), 5.0 out of 5 based on 1 rating

* * *

* * *

Like us on Facebook

HGPU group

167 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1273 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: