Understanding the ISA impact on GPU Architecture
North Carolina State University
North Carolina State University, 2014
@article{kothiya2014understanding,
title={Understanding the ISA impact on GPU Architecture.},
author={Kothiya, Mayank Vinodbhai},
year={2014}
}
The wide spread acceptance of GPU for parallel computation has created the demand for general purpose capabilities in GPU. In response, Industry is coming up rapidly with better architecture to support general purpose processing on GPUs. NVIDIA has come up with Tesla, Fermi and Kepler architecture. General Purpose Graphics Processing Units (GPGPU) are widely being used in many different application domains such as neural networks, matrix computations, graph algorithms etc. [1]. This work studies the evolution of ISA from Tesla architecture to Fermi architecture. General Purpose GPU Simulator (GPGPUSim) currently doesn’t support Native ISA for Fermi architecture [2], however it supports native ISA for previous generation of GPU architecture (Tesla). Our contribution is to extend GPGPUSim to use Native ISA for performance/functional simulation. Also native control flow information can be very useful to generate reference model for hardware implementation. GPGPUSim also doesn’t use control flow information present in Native ISA. It extracts that information from PTX assembly instruction analysis. Our simulator extension uses control flow information present in Native ISA. The methodology for extending GPGPUSIM involves studying the simulator and modifying it to support the target set of benchmarks. The methodology to understand program control flow involved a review of patents and it provided insight for required hardware support to execute Native ISA. In order to validate the understanding gained from these literature study, GPGPUSim is extended and verified on selected benchmarks for Fermi architecture. Various instruction and its semantics were found out by correlation of CUDA (Compute Unified Device Architecture-programming language for NVIDIA’s GPGPU Architecture) Code, PTX code and Native ISA. The dynamic instruction count difference between Tesla ISA and Fermi is around 26% (observed on selected benchmarks). The dynamic instruction count difference reveals interesting insight of the difference between two ISAs. This difference is mainly caused by instruction fusion (For example, combining one multiply and one add into single instruction), predication and uniform branching support, 32 bit multiplication and better addressing modes in Fermi architecture which doesn’t require significant changes in hardware architecture.
July 28, 2014 by hgpu