CUDA Fortran for Scientists and Engineers
NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA 95050
NVIDIA Corporation, 2011
@book{ruetsch2011cuda,
title={CUDA Fortran for Scientists and Engineers},
author={Ruetsch, Greg and Fatica, Massimiliano},
year={2011}
}
This document in intended for scientists and engineers who develop or maintain computer simulations and applications in Fortran, and who would like to harness parallel processing power of graphics processing units (GPUs) to accelerate their code. The goal here is to provide the reader with the fundamentals of GPU programming using CUDA Fortran as well as some typical examples without having the task of developing CUDA Fortran code becoming an end in itself. The CUDA architecture was developed by NVIDIA to allow use of the GPU for general purpose computing without requiring the programmer to have a background in graphics. There are many ways to access the CUDA architecture from a programmer’s perspective, either through C/C++ from CUDA C and Open CL, or through Fortran using PGI’s CUDA Fortran. This document pertains to the latter approach. PGI’s CUDA Fortran should be distinguished from the PGI Accelerator product, which is a directive based approach to using the GPU. CUDA Fortran is simply the Fortran analog to CUDA C. The reader of this book should be familiar with Fortran 90 concepts, such as modules, derived types, and array operations. However, no experience with parallel programming (on the GPU or otherwise) is required. Part of the appeal of parallel programming on GPUs using CUDA is that the programming model is simple and novices can get parallel code up and running very quickly. CUDA is a hybrid programming model, where both GPU and CPU are utilized, so CPU code can be incrementally ported to the GPU. This document is divided into two main sections, the first is a tutorial on CUDA Fortran programming, from the basics of writing CUDA Fortran code to some tips on optimization. The second part of this document is a collection of case studies that demonstrate how the principles in the fuirst section are applied to real-world examples. This document makes use of the PGI 11.x compilers, which can be obtained from http://pgroup.com. Although the examples can be compiled and run on any supported operating system in a variety of development environments, the examples in this document are compiled from the command line as one would do under Linux or Mac OS X.
January 26, 2012 by hgpu