4737

MATLAB Parallelization through Scalarization

Chun-Yu Shei, Adarsh Yoga, Madhav Ramesh, Arun Chauhan
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
15th Workshop on Interaction between Compilers and Computer Architectures (INTERACT), 2011

@article{shei2011matlab,

   title={MATLAB Parallelization through Scalarization},

   author={Shei, C.Y. and Yoga, A. and Ramesh, M. and Chauhan, A.},

   booktitle={15th Workshop on Interaction between Compilers and Computer Architectures (INTERACT), 2011},

   year={2011}

}

Download Download (PDF)   View View   Source Source   

1884

views

While the popularity of using high-level programming languages such as MATLAB for scientific and engineering applications continues to grow, its poor performance compared to traditional languages such as Fortran or C continues to impede its deployment in full-scale simulations and data analysis. Additionally, its poor memory performance limits its performance. To ameliorate performance, we have been developing a MATLAB and Octave compiler that improves performance of MATLAB code by performing type inference and using the resulting type information to remove common bottlenecks. We observe that unlike past results, scalarizing array statements, instead of vectorizing scalar statements, is more fruitful when compiling MATLAB to C or C++. Two important situations where such scalarization helps is in expressions containing array subscripts and sequences of related array statements. In both cases, it is possible to generate fused loops and replace array temporaries by scalars, thus reducing the memory bandwidth pressure. Additional array temporaries are obviated in the case of array subscripts. Further, starting with vectorized statements guarantees that the resulting loops can be parallelized, creating opportunities for a mix of thread-level and instruction-level parallelism as well as GPU execution. We have implemented this strategy in a MATLAB compiler that compiles portions of MATLAB to C++ or CUDA C. Evaluation results on a set of benchmarks selected from diverse domains shows speed improvements ranging from 1.5x to almost 17x on an eight-core Intel Core 2 Duo machine.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: