high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » MATLAB Parallelization through Scalarization

MATLAB Parallelization through Scalarization

Chun-Yu Shei, Adarsh Yoga, Madhav Ramesh, Arun Chauhan

School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA

15th Workshop on Interaction between Compilers and Computer Architectures (INTERACT), 2011

DOI:10.1109/INTERACT.2011.18

@article{shei2011matlab,

title={MATLAB Parallelization through Scalarization},

author={Shei, C.Y. and Yoga, A. and Ramesh, M. and Chauhan, A.},

booktitle={15th Workshop on Interaction between Compilers and Computer Architectures (INTERACT), 2011},

year={2011}

}

Download (PDF)

View

Source

1883

views

While the popularity of using high-level programming languages such as MATLAB for scientific and engineering applications continues to grow, its poor performance compared to traditional languages such as Fortran or C continues to impede its deployment in full-scale simulations and data analysis. Additionally, its poor memory performance limits its performance. To ameliorate performance, we have been developing a MATLAB and Octave compiler that improves performance of MATLAB code by performing type inference and using the resulting type information to remove common bottlenecks. We observe that unlike past results, scalarizing array statements, instead of vectorizing scalar statements, is more fruitful when compiling MATLAB to C or C++. Two important situations where such scalarization helps is in expressions containing array subscripts and sequences of related array statements. In both cases, it is possible to generate fused loops and replace array temporaries by scalars, thus reducing the memory bandwidth pressure. Additional array temporaries are obviated in the case of array subscripts. Further, starting with vectorized statements guarantees that the resulting loops can be parallelized, creating opportunities for a mix of thread-level and instruction-level parallelism as well as GPU execution. We have implemented this strategy in a MATLAB compiler that compiles portions of MATLAB to C++ or CUDA C. Evaluation results on a set of benchmarks selected from diverse domains shows speed improvements ranging from 1.5x to almost 17x on an eight-core Intel Core 2 Duo machine.

Tags: Compilers, Computer science, CUDA, nVidia, Performance, Tesla C1060

July 11, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

MATLAB Parallelization through Scalarization

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

MATLAB Parallelization through Scalarization

Share this:

Recent source codes

Most viewed papers (last 30 days)