Performance Comparison with OpenMP Parallelization for Multi-core Systems
IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2011
Today, the multi-core processor has occupied more and more market shares, and the programming personnel also must face the collision brought by the revolution of multi-core processor. Semiconductor scaling limits and associated power and thermal challenges limit performance growth for single-core microprocessors. This reason leads many microprocessor vendors to turn instead to multi-core chip organizations. So programmer or compiler explicitly parallelize the software is the key for enhance the performance on multi-core chip. At the same time, parallel processing is not only the opportunity but also a challenge. The programmer or compiler explicitly parallelize the software is the key for enhance the performance on multi-core chip. In this paper, what we want to know is there any effective way that can reduce our time on rewrite or can automatically parallel the program for multi-processing purpose and do speedup the processing. We discussed some tools that can automatically generate OpenMP directives from serial C/C++ codes, and compare them with each other include normal C/C++ code, and run on general computer and embedded system. Also we compared some tools that are specifically designed to extract the most of data parallelism from C and FORTRAN kernels and translate them into NVIDIA CUDA or OpenCL to know how mush fast after use them.
August 8, 2011 by hgpu