Books on OpenCL and CUDA
Dear colleagues, we would like to present books on OpenCL and CUDA that were published in 2010-2013.
Moreover, you can study programming techniques directly with the source codes, provided by the authors. The list of OpenCL, CUDA, etc. open-source codes could be found ››› HERE ‹‹‹.
The CUDA Handbook: A Comprehensive Guide to GPU Programming
The CUDA Handbook begins where CUDA By Example leaves off, discussing both CUDA hardware and software in detail that will engage any CUDA developer, from the casual to the most hardcore. Newer CUDA developers will see how the hardware processes commands and the driver checks progress; hardcore CUDA developers will appreciate topics such as the driver API, context migration, and how best to structure CPU/GPU data interchange and synchronization.
The book is partly a reference and partly a cookbook. Careful descriptions of hardware and software abstractions, best practices, and example source code are included. Much of the source code appears in the form of reusable “microbenchmarks” or “microdemos” designed to expose specific hardware characteristics or highlight specific use cases. Best practices are discussed and accompanied with source code. One idea emphasized is the “EERS Principle” (Empirical Evidence Reigns Supreme): that is, determining the fastest way to perform a given operation is best done empirically.
The book includes an extensive glossary, because it’s difficult to write about this topic without throwing word salad at the reader.
OpenCL in Action is a thorough, hands-on presentation of OpenCL, with an eye toward showing developers how to build high-performance applications of their own. It begins by presenting the core concepts behind OpenCL, including vector computing, parallel programming, and multi-threaded operations, and then guides you step-by-step from simple data structures to complex functions.
ABOUT THE TECHNOLOGY
Whatever system you have, it probably has more raw processing power than you’re using. OpenCL is a high-performance programming language that maximizes computational power by executing on CPUs, graphics processors, and other number-crunching devices. It’s perfect for speed-sensitive tasks like vector computing, matrix operations, and graphics acceleration.
ABOUT THIS BOOK
OpenCL in Action blends the theory of parallel computing with the practical reality of building high-performance applications using OpenCL. It first guides you through the fundamental data structures in an intuitive manner. Then, it explains techniques for high-speed sorting, image processing, matrix operations, and fast Fourier transform. The book concludes with a deep look at the all-important subject of graphics acceleration. Numerous challenging examples give you different ways to experiment with working code.
A background in C or C++ is helpful, but no prior exposure to OpenCL is needed.
- Learn OpenCL step by step
- Tons of annotated code
- Tested algorithms for maximum performance
Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU)
General-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread contexts vs. tens), a return to wide vector units (several tens vs. 1-10), memory architectures that deliver higher peak memory bandwidth (hundreds of gigabytes per second vs. tens), and smaller caches/scratchpad memories (less than 1 megabyte vs. 1-10 megabytes).
In this book, we provide a high-level overview of current GPGPU architectures and programming models. We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms. We aim to provide hints to architects about understanding algorithm aspect to GPGPU. We also provide detailed performance analysis and guide optimizations from high-level algorithms to low-level instruction level optimizations. As a case study, we use n-body particle simulations known as the fast multipole method (FMM) as an example. We also briefly survey the state-of-the-art in GPU performance analysis tools and techniques.
Table of Contents:
- GPU Design, Programming, and Trends
- Performance Principles
- From Principles to Practice: Analysis and Tuning
- Using Detailed Performance Analysis to Guide Optimization
“Heterogeneous Computing with OpenCL” teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future.
Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.
- Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.
- Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.
- Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architectures.
- Addresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms.
Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects.
Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.
Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes
- Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
- Programming with OpenCL C and the runtime API
- Using buffers, sub-buffers, images, samplers, and events
- Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
- Simplifying development with the C++ Wrapper API
- Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
- Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
The OpenCL Programming Book starts with the basics of parallelization, then covers the main concepts and terminology, also teaching how to set up a development environment for OpenCL, while concluding with a walkthrough of the source code of an implementation of the fast Fourier transform (FFT) and Mersenne twister algorithms written in OpenCL.
The revised edition includes a summary of changes made in the OpenCL Specification, version 1.2, including a reference of new corresponding functions, and updates to the execution environments.
It is highly recommended for those wishing to get started with programming in OpenCL.
As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.
The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.
Using an approach refined in a series of well-received articles at Dr Dobb’s Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.
- Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computing
- Addresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchy
- Includes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure.
“This book is required reading for anyone working with accelerator-based computing systems.”
– from the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory
CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is required-just the ability to program in a modestly extended version of C.
CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. You’ll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance.
Major topics covered include
- Parallel programming
- Thread cooperation
- Constant memory and events
- Texture memory
- Graphics interoperability
- CUDA C on multiple GPUs
- Advanced atomics
- Additional CUDA resources
All the CUDA software tools you’ll need are freely available for download from NVIDIA.
This is the second volume of Morgan Kaufmann’s GPU Computing Gems, offering an all-new set of insights, ideas, and practical “hands-on” skills from researchers and developers worldwide. Each chapter gives you a window into the work being performed across a variety of application domains, and the opportunity to witness the impact of parallel GPU computing on the efficiency of scientific research.
GPU Computing Gems: Jade Edition showcases the latest research solutions with GPGPU and CUDA, including:
- Improving memory access patterns for cellular automata using CUDA
- Large-scale gas turbine simulations on GPU clusters
- Identifying and mitigating credit risk using large-scale economic capital simulations
- GPU-powered MATLAB acceleration with Jacket
- Biologically-inspired machine vision
- An efficient CUDA algorithm for the maximum network flow problem
- 30 more chapters of innovative GPU computing ideas, written to be accessible to researchers from any industry
GPU Computing Gems: Jade Edition contains 100% new material covering a variety of application domains: algorithms and data structures, engineering, interactive physics for games, computational finance, and programming tools.
- This second volume of GPU Computing Gems offers 100% new material of interest across industry, including finance, medicine, imaging, engineering, gaming, environmental science, green computing, and more
- Covers new tools and frameworks for productive GPU computing application development and offers immediate benefit to researchers developing improved programming environments for GPUs
- Even more hands-on, proven techniques demonstrating how general purpose GPU computing is changing scientific research
“…the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk.” – Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010
Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines.
GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including:
- Black hole simulations with CUDA
- GPU-accelerated computation and interactive display of molecular orbitals
- Temporal data mining for neuroscience
- GPU -based parallelization for fast circuit optimization
- Fast graph cuts for computer vision
- Real-time stereo on GPGPU using progressive multi-resolution adaptive windows
- GPU image demosaicing
- Tomographic image reconstruction from unordered lines with CUDA
- Medical image processing using GPU -accelerated ITK image filters
- 41 more chapters of innovative GPU computing ideas, written to be accessible to researchers from any domain
GPU Computing Gems: Emerald Edition is the first volume in Morgan Kaufmann’s Applications of GPU Computing Series, offering the latest insights and research in computer vision, electronic design automation, emerging data-intensive applications, life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, and video / image processing.
- Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more
- Many examples leverage NVIDIA’s CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution
This book covers the new topic of GPU computing with many applications involved, taken from diverse fields such as networking, seismology, fluid mechanics, nano-materials, data-mining, earthquakes, mantle convection, visualization. It will show the public why GPU computing is important and easy to use. It will offer a reason why GPU computing is useful and how to implement codes in an everyday situation.
In 2011 many computer users were exploring the opportunities and the benefits of the massive parallelism offered by heterogeneous computing. In 2000 the Khronos Group, a not-for-profit industry consortium, was founded to create standard open APIs for parallel computing, graphics and dynamic media. Among them has been OpenCL, an open system for programming heterogeneous computers with components made by multiple manufacturers. This publication explains how heterogeneous computers work and how to program them using OpenCL. It also describes how to combine OpenCL with OpenGL for displaying graphical effects in real time. Chapter 1 describes briefly two older de facto standard and highly successful parallel programming systems: MPI and OpenMP. Collectively, the MPI, OpenMP, and OpenCL systems cover programming of all major parallel architectures: clusters, shared-memory computers, and the newest heterogeneous computers. Chapter 2, the technical core of the book, deals with OpenCL fundamentals: programming, hardware, and the interaction between them. Chapter 3 adds important information about such advanced issues as double-versus-single arithmetic precision, efficiency, memory use, and debugging. Chapters 2 and 3 contain several examples of code and one case study on genetic algorithms. These examples are related to linear algebra operations, which are very common in scientific, industrial, and business applications. Most of the books examples can be found on the enclosed CD, which also contains basic projects for Visual Studio, MinGW, and GCC. This supplementary material will assist the reader in getting a quick start on OpenCL projects.
CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs
If you need to learn CUDA but don’t have experience with parallel computing, CUDA Programming: A Developer’s Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts including threads, blocks, grids, and memory focus on both parallel and CUDA-specific issues. Later, the book demonstrates CUDA in practice for optimizing applications, adjusting to new hardware, and solving common problems.
- Comprehensive introduction to parallel programming with CUDA, for readers new to both
- Detailed instructions help readers optimize the CUDA software development kit
- Practical techniques illustrate working with memory, threads, algorithms, resources, and more
- Covers CUDA on multiple hardware platforms: Mac, Linux and Windows with several NVIDIA chipsets
- Each chapter includes exercises to test reader knowledge
This book focuses on advanced rendering techniques that run on the DirectX and/or OpenGL run-time with any shader language available. It includes articles on the latest and greatest techniques in real-time rendering, including MLAA, adaptive volumetric shadow maps, light propagation volumes, wrinkle animations, and much more. The book emphasizes techniques for handheld programming to reflect the increased importance of graphics on mobile devices. It covers geometry manipulation, effects in image space, shadows, 3D engine design, GPGPU, and graphics-related tools.
GPU Pro3, the third volume in the GPU Pro book series, offers practical tips and techniques for creating real-time graphics that are useful to beginners and seasoned game and graphics programmers alike.
Section editors Wolfgang Engel, Christopher Oat, Carsten Dachsbacher, Wessam Bahnassi, and Sebastien St-Laurent have once again brought together a high-quality collection of cutting-edge techniques for advanced GPU programming. With contributions by more than 50 experts, GPU Pro3: Advanced Rendering Techniques covers battle-tested tips and tricks for creating interesting geometry, realistic shading, real-time global illumination, and high-quality shadows, for optimizing 3D engines, and for taking advantage of the advanced power of the GPGPU.
Sample programs and source code to accompany some of the chapters are available at http://www.akpeters.com/gpupro
Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)
Multi-core processors are no longer the future of computing-they are the present day reality. A typical mass-produced CPU features multiple processor cores, while a GPU (Graphics Processing Unit) may have hundreds or even thousands of cores. With the rise of multi-core architectures has come the need to teach advanced programmers a new and essential skill: how to program massively parallel processors.
Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs.
- Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing.
- Utilizes CUDA (Compute Unified Device Architecture), NVIDIA’s software development tool created specifically for massively parallel environments.
Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in depth.
This best-selling guide to CUDA and GPU parallel programming has been revised with more parallel programming examples, commonly-used libraries such as Thrust, and explanations of the latest tools. With these improvements, the book retains its concise, intuitive, practical approach based on years of road-testing in the authors’ own parallel computing courses.
Updates in this new edition include:
- New coverage of CUDA 5.0, improved performance, enhanced development tools, increased hardware support, and more
- Increased coverage of related technology, OpenCL and new material on algorithm patterns, GPU clusters, host programming, and data parallelism
- Two new case studies (on MRI reconstruction and molecular visualization) explore the latest applications of CUDA and GPUs for scientific research and high-performance computing
Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number of applications that traditionally used Application Specific Integrated Circuits (ASICs) are now implemented with concurrent processors in order to improve functionality and reduce engineering cost. The real challenge is to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors. Visit the CS193G companion website for course materials.
Books on parallel programming theory often talk about such weird beasts like the PRAM model, a hypothetical hardware that would provide the programmer with a number of processors that is proportional to the input size of the problem at hand. Modern general purpose computers afford only a few processing units; four is currently a reasonable number. This limitation makes the development of highly parallel applications quite difficult to the average computer user. However, the low cost and the increasing programmability of graphics processing units, popularly know as GPUs, is contributing to overcome this difficulty. Presently, the application developer can have access, for a few hundred dollars, to a hardware boosting hundreds of processing elements. This brave new world that is now open to many programmers brings, alongside the incredible possibilities, also difficulties and challenges. Perhaps, for the first time since the popularization of computers, it makes sense to open the compiler books on the final chapters, which talk about very unusual concepts, such as polyhedral loops, iteration space and Fourier-Motskin transformations, only to name a few of these chimerical creatures. This material covers, in a very condensed way, some code generation and optimization techniques that a compiler would use to produce efficient code for graphics processing units. Through these techniques, the compiler writer tries to free the application developer from the intricacies and subtleties of GPU programming, giving him more freedom to focus on algorithms instead of micro-optimizations. We will discuss a little bit of what are GPUs, which applications should target them, how the compiler sees a GPU program and how the compiler can transform this program so that it will take more from this very powerful hardware.
Contemporary High Performance Computing: From Petascale toward Exascale
“Contemporary High Performance Computing: From Petascale toward Exascale” focuses on the ecosystems surrounding the world’s leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors.
The first part of the book examines significant trends in HPC systems, including computer architectures, applications, performance, and software. It discusses the growth from terascale to petascale computing and the influence of the TOP500 and Green500 lists. The second part of the book provides a comprehensive overview of 18 HPC ecosystems from around the world. Each chapter in this section describes programmatic motivation for HPC and their important applications; a flagship HPC system overview covering computer architecture, system software, programming systems, storage, visualization, and analytics support; and an overview of their data center/facility. The last part of the book addresses the role of clouds and grids in HPC, including chapters on the Magellan, FutureGrid, and LLGrid projects.
With contributions from top researchers directly involved in designing, deploying, and using these supercomputing systems, this book captures a global picture of the state of the art in HPC.
“High Performance Computing: Programming and Applications” presents techniques that address new performance issues in the programming of high performance computing (HPC) applications. Omitting tedious details, the book discusses hardware architecture concepts and programming techniques that are the most pertinent to application developers for achieving high performance. Even though the text concentrates on C and Fortran, the techniques described can be applied to other languages, such as C++ and Java.
Drawing on their experience with chips from AMD and systems, interconnects, and software from Cray Inc., the authors explore the problems that create bottlenecks in attaining good performance. They cover techniques that pertain to each of the three levels of parallelism:
- Message passing between the nodes
- Shared memory parallelism on the nodes or the multiple instruction, multiple data (MIMD) units on the accelerator
- Vectorization on the inner level
After discussing architectural and software challenges, the book outlines a strategy for porting and optimizing an existing application to a large massively parallel processor (MPP) system. With a look toward the future, it also introduces the use of general purpose graphics processing units (GPGPUs) for carrying out HPC computations. A companion website at www.hybridmulticoreoptimization.com contains all the examples from the book, along with updated timing results on the latest released processors.
GPUs may have started life as graphics processors, but recently they’ve emerged as a fantastic numerical co-processor for high-performance general applications on the CPU. This book not only teaches you the fundamentals of parallel programming with GPUs, it helps you think in parallel. You learn best practices, algorithms, and designs for achieving greater application performance with these processors.
Amazon recently added GPU supercomputing to its cloud-computing platform-a clear sign that parallel programming is becoming an essential skill. This book includes valuable input from major CPU and GPU manufacturers-Intel, NVIDIA and AMD-to help experienced programmers get a head start on programming GPU applications.
- Understand the differences between parallel and sequential programming
- Learn about GPU architecture, including the runtime environment, threads, and memory
- Build and deploy GPU applications and libraries-and port existing applications
- Use debugging and profiling tools and techniques
- Write GPU programs for clusters and the cloud
GPUs have been increasingly used as tools for high-performance computation, in addition to graphics. They play a major role in finance, bioinformatics, image processing, artificial intelligence, and other areas that require extremely high computational performance. Amazon has noticed this trend, and recently added GPU instances to EC2. This book shows you how to use Amazon’s EC2 GPU instances effectively, whatever your problem domain.
Parallel and Concurrent Programming in Haskell: Techniques for Multicore and Multithreaded Programming
This book covers the breadth of Haskell’s diverse selection of programming APIs for concurrent and parallel programming. It is split into two parts. The first part, on parallel programming, covers the techniques for using multiple processors to speed up CPU-intensive computations, including methods for using parallelism in both idiomatic Haskell and numerical array-based algorithms, and for running computations on a GPU. The second part, on concurrent programming, covers techniques for using multiple threads, including overlapping multiple I/O operations, building concurrent network servers, and distributed programming across multiple machines.
Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on the essentials of programming for this new architecture and these new products.
This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture.
- A practical guide to the essentials of the Intel Xeon Phi coprocessor
- Presents best practices for portable, high-performance computing and a familiar and proven threaded, scalar-vector programming model
- Includes simple but informative code examples that explain the unique aspects of this new highly parallel and high performance computational product
- Covers wide vectors, many cores, many threads and high bandwidth cache/memory architecture
Parallel Programming and Optimization with Intel® Xeon Phi™ Coprocessors
This book will guide you to the mastery of parallel programming with Intel® Xeon® family products: Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. It includes a detailed presentation of the programming paradigm for Intel® Xeon® product family, optimization guidelines, and hands-on exercises on systems equipped with the Intel® Xeon Phi™ coprocessors, as well as instructions on using Intel software development tools and libraries included in Intel Parallel Studio XE.
This book is targeted toward developers familiar with C/C++ programming in Linux. Developers with little parallel programming experience will be able to grasp the core concepts of these subjects from the detailed commentary in Chapter 3. For advanced developers familiar with multi-core and/or GPU programming, the ebook offers materials specific to Intel compilers and Intel® Xeon® family products, as well as optimization advice pertinent to Many Integrated Core (MIC) architecture.
We have written these materials relying on key elements for efficient learning: practice and repetition. As a consequence, the reader will find a great number of code listings in the main section of these materials. In the extended appendix, we provided numerous hands-on exercises that one can complete either under an instructors supervision, or autonomously in a self-paced training environment.
This document is different from a typical book on computer science, because we intended it to be used as a lecture plan in an intensive learning course. Speaking in programming terms, a typical book traverses material with a depth-first algorithm, describing every detail of each method or concept before moving on to the next method. In contrast, this document traverses the scope of materials with a breadth-first algorithm. First, we give an overview of multiple methods to address a certain issue. In the subsequent chapter, we re-visit these methods, this time in greater detail. We may go into even more depth down the line. In this way, we expect that developers will have enough time to absorb and comprehend the variety of programming and optimization methods presented here.