A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture

Ashis Kumar Dash
International Journal of Scientific & Engineering Research, Volume 5, Issue 1, 2014


   title={A Performance Criteria for parallel Computation on basis of block size using CUDA Architecture},

   author={Dash, Ashis Kumar},



Download Download (PDF)   View View   Source Source   



GPU based on CUDA Architecture developed by NVIDIA is a high performance computing device. Multiplication of matrices of large order can be computed in few seconds using GPU based on CUDA Architecture. A modern GPU consists of 16 highly threaded streaming multiprocessors (SMs). GPU named Fermi consists of 32 SMs. These are computing intensive devices. GPUs have been found to be the best platform for massive data parallelism. CUDA architecture is based on the heterogeneous platform comprising of both CPU and GPU that offers enormous potential to solve complex harder problems with high speed. In most applications the sequential part of a program is executed using CPU and numeric intensive part on GPU. But mere execution of numeric intensive part on GPU will not increase the performance of the computation. Since GPU consists of highly threaded multiprocessors, threads must be well organised into Grids and Grid into blocks to maximize performance of parallel computation, depending upon architecture of the GPU. In this paper an organization of threads of a particular GPU is discussed and block size is determined to maximize the performance of parallel computation through matrix multiplication.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: