Understanding Latency Hiding on GPUs

Vasily Volkov
Electrical Engineering and Computer Sciences, University of California at Berkeley
University of California at Berkeley, Technical Report No. UCB/EECS-2016-143, 2016

   author={Volkov, Vasily},

   title={Understanding Latency Hiding on GPUs},

   school={EECS Department, University of California, Berkeley},






Download Download (PDF)   View View   Source Source   



Modern commodity processors such as GPUs may execute up to about a thousand of physical threads per chip to better utilize their numerous execution units and hide execution latencies. Understanding this novel capability, however, is hindered by the overall complexity of the hardware and complexity of typical workloads. In this dissertation, we suggest a better way to understand modern multithreaded performance by considering a family of synthetic workloads, which use the same key hardware capabilities – memory access, arithmetic operations, and multithreading – but are otherwise as simple as possible. One of our surprising findings is that prior performance models for GPUs fail on these workloads: they mispredict observed throughputs by factors of up to 1.7. We analyze these prior approaches, identify a number of common pitfalls, and discuss the related subtleties in understanding concurrency and Little’s Law. Also, we help to further our understanding by considering a few basic questions, such as on how different latencies compare with each other in terms of latency hiding, and how the number of threads needed to hide latency depends on basic parameters of executed code such as arithmetic intensity. Finally, we outline a performance modeling framework that is free from the found limitations. As a tangential development, we present a number of novel experimental studies, such as on how mean memory latency depends on memory throughput, how latencies of individual memory accesses are distributed around the mean, and how occupancy varies during execution.
VN:F [1.9.22_1171]
Rating: 5.0/5 (4 votes cast)
Understanding Latency Hiding on GPUs, 5.0 out of 5 based on 4 ratings

* * *

* * *

TwitterAPIExchange Object
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1477202542
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1477202542
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => CLS0bzP0EN2HyyJ2UNDPQwer7MM=

    [url] => https://api.twitter.com/1.1/users/show.json
Follow us on Facebook
Follow us on Twitter

HGPU group

2033 peoples are following HGPU @twitter

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: