high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework

Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework

Pedro Bruel, Marcos Amaris, Alfredo Goldman

Instituto de Matematica e Estatistica (IME), Universidade de Sao Paulo (USP), R. do Matao, 1010 – Cidade, Universitaria, Sao Paulo – SP, 05508-090

Concurrency and Computation Practice and Experience, 2017

BibTeX

Download (PDF)

View

Source

Source codes

Package:

gpu-autotuning

2334

views

A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of a set of problem instances. An autotuner solves the Algorithm Selection Problem using search and optimization techniques. In this paper we implement an autotuner for the CUDA compiler’s parameters using the OpenTuner framework. The autotuner searches for a set of compilation parameters that optimizes the time to solve a problem. We analyse the performance speedups, in comparison with high-level compiler optimizations, achieved in three different GPU devices, for 17 heterogeneous GPU applications, 12 of which are from the Rodinia Benchmark Suite. The autotuner often beat the compiler’s high-level optimizations, but underperformed for some problems. We achieved over 2x speedup for Gaussian Elimination and almost 2x speedup for Heart Wall, both problems from the Rodinia Benchmark, and over 4x speedup for a matrix multiplication algorithm.

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, Matrix multiplication, nVidia, nVidia GeForce GTX 750, nVidia GeForce GTX 980, Package, Performance, Tesla K40

November 16, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)