high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

Deepak Mathews Panickal

School of Informatics, University of Edinburgh

University of Edinburgh, 2011

BibTeX

Download (PDF)

View

Source

2198

views

Open Computing Language (OpenCL) is an open standard for writing portable software for heterogeneous architectures such as Central Processing Units (CPUs) and Graphic Processing Units (GPUs). Programs written in OpenCL are functionally portable across architectures. However, due to the architectural differences, OpenCL does not warrant performance portability. As previous research shows, different architectures are sensitive to different optimization parameters. A parameter which exhibits good performance on an architecture might not be so for another. In this thesis, the optimization space of multi-core architectures is explored by running OpenCL benchmarks. The benchmarks are run for all possible combinations of optimization parameters. Exploring the optimization space is not a trivial task as there are various factors, such as the number of threads, the vectorization factor, etc., which impact the performance. The value range that each parameter takes is quite large. For e.g., the number of threads can vary from from 1 to 2^25. Four different architectures are evaluated in this thesis. Considering all the parameter combinations for all the four architectures, the optimization space is prohibitively large to be explored within the time constraints of the project. Impossible combinations are pruned to reduce the exploration space. Over 600,000 runs of the OpenCL benchmarks are executed to exhaustively explore this space and successfully identify the optimal optimization parameters. In addition, the rationality for a parameter being the best on a particular architecture is sought out. The findings of the thesis could be used by developers for significantly improving the performance of their OpenCL applications. They could also be incorporated into a compiler for automatic optimization based on the target architecture.

Tags: ATI, ATI Radeon HD 5470, Benchmarking, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce 8800 GTX, OpenCL, Optimization, Tesla C2070, Thesis

December 9, 2011 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

Exploring the Optimization Space of Multi-Core Architectures with OpenCL Benchmarks

Share this:

Recent source codes

Most viewed papers (last 30 days)