high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » GPU performance prediction using parametrized models

GPU performance prediction using parametrized models

Andreas Resios

Utrecht University

Utrecht University, 2011

BibTeX

Download (PDF)

View

Source

1781

views

Compilation on modern architectures has become an increasingly difficult challenge with the evolution of computers and computing needs. In particular, programmers expect the compiler to produce optimized code for a variety of hardware, making the most of their theoretical performance. For years this was not a problem because hardware vendors consistently delivered increases in clock rates and instruction-level parallelism, so that single-threaded programs achieved speedup on newer processors without any modification. Nowadays to increase performance and overcome physical limitations, the hardware industry favours multi-core CPUs and massively parallel hardware accelerators (GPUs, FPGAs), and software has to be written explicitly in a multi-threaded or multi-process manner to gain performance. Thus, the performance problem has shifted from hardware designers to compiler writers and software developers who now have to perform parallelization. Such a transformation involves identifying and mapping independent data and computation to a complex hierarchy of memory, computing, and interconnection resources. When performing parallelization it is important to take into account the overhead introduced by communication, thread spawning, and synchronization. If the overhead is high the introduced optimization can lead to a performance loss. Thus, an important question in this process is to evaluate whether the optimization brings any performance improvements. The answer is usually computed using a performance model which is an abstraction of the target hardware [29, 30]. Our research addresses this problem in the context of parallelizing sequential programs to GPU platforms. The main result is a GPU performance model for data-parallel programs which predicts the execution time and identifies bottlenecks of GPU programs. During the thesis we will present the factors which in uence GPU performance and show how our model takes them into account. We validated our model in the context of a production ready analysis tool vfEmbedded [33] which combines static and dynamic analyses to parallelize C code for heterogeneous platforms. Since the tool has an interactive compilation work-flow, our model not only estimates execution time but also computes several metrics which help users decide if their program is worth porting to the GPU.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 460, Optimization, Performance, Thesis

October 22, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

GPU performance prediction using parametrized models

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

GPU performance prediction using parametrized models

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)