Auto-Tunning of Data Communication on Heterogeneous Systems

Marc Jorda, Ivan Tanasic, Javier Cabezas, Lluis Vilanova, Isaac Gelado, Nacho Navarro
Barcelona Supercomputing Center
7th IEEE International Symposium on Embedded Multicore/Many-core System-on-Chip, 2013

   title={Auto-Tunning of Data Communication on Heterogeneous Systems},

   author={Jorda, Marc and Tanasic, Ivan and Cabezas, Javier and Vilanova, Lluis and Gelado, Isaac and Navarro, Nacho},



Download Download (PDF)   View View   Source Source   



Heterogeneous systems formed by trandional CPUs and compute accelerators, such as GPUs, are becoming widely used to build modern supercomputers. However, many different system topologies, i.e., how CPUs, accelerators, and I/O devices are interconnected, are being deployed. Each system organization presents different trade-offs when transferring data between CPUs, accelerators, and nodes within a cluster, requiring each of them a different software implementation to achieve optimal data communication bandwidth. Hence, there is a great potential for auto-tunning of applications to match the constrains of the system where the code is being executed. In his paper we explore the potential impact of the two key optimizations to achieve optimal data transfer bandwidth: topology-aware process placement policies, and double-buffering. We design a set of experiments to evaluate all possible alternatives, and run each of them on different hardware configurations. We show that optimal data transfer mechanisms not only depend on the hardware topology, but also on the application dataset. Our experimental results show that auto-tunning applications to match the hardware topology, and the usage of double-buffering for large data transfers can improve the data transfer bandwidth by ~70% of local communication. We also show that doublebuffering large data transfers is key to achieve optimal bandwidht on remote communication for data transfers larger than 128KB, while topology-aware policies produce minimal benefits.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1512 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

261 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: