high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

Newsha Ardalani, Karthikeyan Sankaralingam, Xiaojin Zhu

University of Wisconsin Madison

University of Wisconsin Madison, Technical Report TR1811, 2014

BibTeX

Download (PDF)

View

Source

2603

views

Heterogeneous processing using GPUs is here to stay and today spans mobile devices, laptops, and supercomputers. Although modern software development frameworks like OpenCL and CUDA serve as a high productivity environment, software development for GPUs is time consuming. First, much work needs to be done to restructure software and data organization to match the GPU’s many-threaded programming model. Second, code optimization is quite time consuming and performance analysis tools require significant expertise to use effectively. Third, until the final optimized code has been derived, it is almost impossible today to know what performance advantage will be provided by porting a code to a GPU. This paper focuses on this last question and seeks to develop an automated "performance prediction" tool that can provide accurate estimate of GPU speedup when provided a piece of CPU code prior to developing the GPU code. Our paper is built on two insights: i) Ultimately speedup on a GPU for a piece of code is dependent on fundamental microarchitecture-independent program properties like available parallelism, branching behavior etc. ii) By examining a vast array of previously implemented GPU codes along-with their CPU counterpart, we can use machine learning to learn this correlation between program properties and GPU speedup. In this paper, we use linear regression, specifically, a technique inspired by regularized regression, to build a model for speedup prediction for GPUs. When applied to a never-seen test data selected randomly from Rodinia, Parboil, Lonestar and Parsec benchmark suites, as test data (speedup range of 5.9X to 276X our tool makes accurate predictions with an average weighted error of 32%. Our technique is also robust – the errors remain similar across other "unseen" GPU platforms we test on. Essentially, we deliver an automated tool that programmers can use to estimate potential GPU speedup before writing any GPU code.

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, nVidia GeForce GTX 660 Ti, OpenCL, Performance, Tesla K20

August 23, 2014 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

Share this:

Recent source codes

Most viewed papers (last 30 days)