Parallel paradigms in optimal structural design

Van Huyssteen, Salomon Stephanus
Department of Mechanical and Mechatronic Engineering, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa
University of Stellenbosch, 2011


   title={Parallel paradigms in optimal structural design},

   author={Van Huyssteen, S.S.},


   publisher={Stellenbosch: Stellenbosch University}


Download Download (PDF)   View View   Source Source   



Modern-day processors are not getting any faster. Due to the power consumption limit of frequency scaling, parallel processing is increasingly being used to decrease computation time. In this thesis, several parallel paradigms are used to improve the performance of commonly serial SAO programs. Four novelties are discussed: First, replacing double precision solvers with single precision solvers. This is attempted in order to take advantage of the anticipated factor 2 speed increase that single precision computations have over that of double precision computations. However, single precision routines present unpredictable performance characteristics and struggle to converge to required accuracies, which is unfavourable for optimization solvers. Second, QP and dual are statements pitted against one another in a parallel environment. This is done because it is not always easy to see which is best a priori. Therefore both are started in parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP vs. dual statements prove to be very attractive, converging within the minimum number of outer iterations. The most appropriate solver is selected as the problem properties change during the iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling competing routines. Third, multiple global searches are started in parallel on a shared memory system. Problems see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the need for set thread amounts, as in message passing implementations. Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units), proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are recorded.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: