high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Study of the Potential of Locality-Aware Thread Scheduling for GPUs

A Study of the Potential of Locality-Aware Thread Scheduling for GPUs

Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal

Eindhoven University of Technology, Eindhoven, The Netherlands

7th International Workshop on Multi/many-Core Computing Systems (MuCoCoS’14), 2014

BibTeX

Download (PDF)

View

Source

1863

views

Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threads, effectively removing ordering constraints. Still, parallel architectures such as the graphics processing unit (GPU) do not exploit the potential of data-locality enabled by this independence. Therefore, programmers are required to manually perform data-locality optimisations such as memory coalescing or loop tiling. This work makes a case for locality-aware thread scheduling: re-ordering threads automatically for better locality to improve the programmability of multi-threaded processors. In particular, we analyse the potential of locality-aware thread scheduling for GPUs, considering among others cache performance, memory coalescing and bank locality. This work does not present an implementation of a locality-aware thread scheduler, but rather introduces the concept and identifies the potential. We conclude that non-optimised programs have the potential to achieve good cache and memory utilisation when using a smarter thread scheduler. A case-study of a naive matrix multiplication shows for example a 87% performance increase, leading to an IPC of 457 on a 512-core GPU.

Tags: Computer science, CUDA, Matrix multiplication, nVidia, nVidia GeForce GTX 580, OpenCL, Task scheduling

September 28, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Study of the Potential of Locality-Aware Thread Scheduling for GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

A Study of the Potential of Locality-Aware Thread Scheduling for GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)