{"id":16149,"date":"2016-07-08T00:24:24","date_gmt":"2016-07-07T21:24:24","guid":{"rendered":"http:\/\/hgpu.org\/?p=16149"},"modified":"2016-07-08T00:24:24","modified_gmt":"2016-07-07T21:24:24","slug":"ttc-a-tensor-transposition-compiler-for-multiple-architectures","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=16149","title":{"rendered":"TTC: A Tensor Transposition Compiler for Multiple Architectures"},"content":{"rendered":"<p>We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++\/CUDA C code that achieves a significant fraction of the system&#8217;s peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel&#8217;s Knights Corner as well as different CUDA-based GPUs such as NVIDIA&#8217;s Kepler and Maxwell architectures. We report speedups of TTC over a meaningful baseline implementation generated by external C++ compilers; the results suggest that a domain-specific compiler can outperform its general purpose counterpart significantly: For instance, comparing with Intel&#8217;s latest C++ compiler on the Haswell and Knights Corner architecture, TTC yields speedups of up to $8times$ and $32times$, respectively. We also showcase TTC&#8217;s support for multiple leading dimensions, making it a suitable candidate for the generation of performance-critical packing functions that are at the core of the ubiquitous BLAS 3 routines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++\/CUDA C code that achieves a significant fraction of the system&#8217;s peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel&#8217;s Knights Corner [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[430,955,1782,14,1483,37,597,20,1899,176,67,1543],"class_list":["post-16149","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-blas","tag-compilers","tag-computer-science","tag-cuda","tag-intel-xeon-phi","tag-linear-algebra","tag-mathematical-software","tag-nvidia","tag-nvidia-geforce-840-m","tag-package","tag-performance","tag-tesla-k40"],"views":2289,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/16149","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16149"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/16149\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16149"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16149"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}