9696

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

Philippe Tillet, Karl Rupp, Siegfried Selberherr, Chin-Teng Lin
Institute for Microelectronics, TU Wien
5th USENIX Workshop on Hot Topics in Parallelism (HotPar’13), 2013
@article{tillet2013towards,

   title={Towards Performance-Portable, Scalable, and Convenient Linear Algebra},

   author={Tillet, Philippe and Wien, TU and Rupp, Karl and Selberherr, Siegfried and Lin, Chin-Teng},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

504

views

The rise of multi- and many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a unified programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-specific solutions, even if only simple operations are to be performed. Also, the unified programming framework does not automatically provide any guarantees on performance portability of a particular implementation. Thus, device-specific compute kernels are still required for obtaining good performance across different hardware architectures. We address both, the issue of programmability and portable performance, in this work: On the one hand, a high-level programming interface for linear algebra routines allows for the convenient specification of the operations of interest without having to go into the details of the underlying hardware. On the other hand, we discuss the underlying generator for device-specific OpenCL kernels at runtime, which is supplemented by an auto-tuning framework for portable performance as well as with work partitioning and task scheduling for multiple devices. Our benchmark results show portable performance across hardware from major vendors. In all cases, at least 75 percent of the respective vendor-tuned library was obtained, while in some cases we even outperformed the reference. We further demonstrate the convenient and efficient use of our high-level interface in a multi-device setting with good scalability.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1860 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

406 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: