Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU

Shixiong Xu, David Gregg
Software Tools Group, Department of Computer Science, Trinity College, The University of Dublin, Ireland
2015 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2015

   title={Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU},

   author={Xu, Shixiong and Gregg, David},



Download Download (PDF)   View View   Source Source   



Memory performance is of great importance to achieve high performance on the Nvidia CUDA GPU. Previous work has proposed specific optimizations such as thread coarsening, caching data in shared memory, and global data layout transformation. We argue that vectorization based on hyper loop parallelism can be used as a unified technique to optimize the memory performance. In this paper, we put forward a compiler framework based on the Cetus source-tosource compiler to improve the memory performance on the CUDA GPU by efficiently exploiting hyper loop parallelism in vectorization. We introduce abstractions of SIMD vectors and SIMD operations that match the execution model and memory model of the CUDA GPU, along with three different execution mapping strategies for efficiently offloading vectorized code to CUDA GPUs. In addition, as we employ the vectorization in C-to-CUDA with automatic parallelization, our technique further refines the mapping granularity between coarse-grain loop parallelism and GPU threads. We evaluated our proposed technique on two platforms, an embedded GPU system – Jetson TK1 – and a desktop GPU – GeForce GTX 645. The experimental results demonstrate that our vectorization technique based on hyper loop parallelism can yield performance speedups up to 2.5x compared to the direct coarse-grain loop parallelism mapping.
VN:F [1.9.22_1171]
Rating: 4.0/5 (4 votes cast)
Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU, 4.0 out of 5 based on 4 ratings

* * *

* * *

TwitterAPIExchange Object
    [oauth_access_token:TwitterAPIExchange:private] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
    [oauth_access_token_secret:TwitterAPIExchange:private] => o29ji3VLVmB6jASMqY8G7QZDCrdFmoTvCDNNUlb7s
    [consumer_key:TwitterAPIExchange:private] => TdQb63pho0ak9VevwMWpEgXAE
    [consumer_secret:TwitterAPIExchange:private] => Uq4rWz7nUnH1y6ab6uQ9xMk0KLcDrmckneEMdlq6G5E0jlQCFx
    [postfields:TwitterAPIExchange:private] => 
    [getfield:TwitterAPIExchange:private] => ?cursor=-1&screen_name=hgpu&skip_status=true&include_user_entities=false
    [oauth:protected] => Array
            [oauth_consumer_key] => TdQb63pho0ak9VevwMWpEgXAE
            [oauth_nonce] => 1477333136
            [oauth_signature_method] => HMAC-SHA1
            [oauth_token] => 301967669-yDz6MrfyJFFsH1DVvrw5Xb9phx2d0DSOFuLehBGh
            [oauth_timestamp] => 1477333136
            [oauth_version] => 1.0
            [cursor] => -1
            [screen_name] => hgpu
            [skip_status] => true
            [include_user_entities] => false
            [oauth_signature] => 73+EGO4Nyb1bxynhBt6ClJugvnc=

    [url] => https://api.twitter.com/1.1/users/show.json
Follow us on Facebook
Follow us on Twitter

HGPU group

2033 peoples are following HGPU @twitter

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: