{"id":18454,"date":"2018-09-09T22:11:58","date_gmt":"2018-09-09T19:11:58","guid":{"rendered":"https:\/\/hgpu.org\/?p=18454"},"modified":"2018-09-09T22:11:58","modified_gmt":"2018-09-09T19:11:58","slug":"using-simd-and-simt-vectorization-to-evaluate-sparse-chemical-kinetic-jacobian-matrices-and-thermochemical-source-terms","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=18454","title":{"rendered":"Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms"},"content":{"rendered":"<p>Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction\/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation\/power-generation applications make the efficient evaluation\/factorization of the chemical kinetic Jacobian and thermochemical source-terms critical to the performance of reactive-flow codes. Here we investigate the performance of vectorized evaluation of constant-pressure\/volume thermochemical source-term and sparse\/dense chemical kinetic Jacobians using single-instruction, multiple-data (SIMD) and single-instruction, multiple thread (SIMT) paradigms. These are implemented in pyJac, an open-source, reproducible code generation platform. A new formulation of the chemical kinetic governing equations was derived and verified, resulting in Jacobian sparsities of 28.6-92.0% for the tested models. Speedups of 3.40-4.08x were found for shallow-vectorized OpenCL source-rate evaluation compared with a parallel OpenMP code on an avx2 central processing unit (CPU), increasing to 6.63-9.44x and 3.03-4.23x for sparse and dense chemical kinetic Jacobian evaluation, respectively. Furthermore, the effect of data-ordering was investigated and a storage pattern specifically formulated for vectorized evaluation was proposed; as well, the effect of the constant pressure\/volume assumptions and varying vector widths were studied on source-term evaluation performance. Speedups reached up to 17.60x and 45.13x for dense and sparse evaluation on the GPU, and up to 55.11x and 245.63x on the CPU over a first-order finite-difference Jacobian approach. Further, dense Jacobian evaluation was up to 19.56x and 2.84x times faster than a previous version of pyJac on a CPU and GPU, respectively.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction\/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation\/power-generation applications make the efficient evaluation\/factorization of the chemical kinetic Jacobian and thermochemical source-terms critical [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[90,3,12],"tags":[515,215,98,288,20,1793,176,1783,1226,1543],"class_list":["post-18454","post","type-post","status-publish","format-standard","hentry","category-opencl","category-paper","category-physics","tag-chemical-physics","tag-code-generation","tag-computational-physics","tag-factorization","tag-nvidia","tag-opencl","tag-package","tag-physics","tag-tesla-c2075","tag-tesla-k40"],"views":2491,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/18454","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=18454"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/18454\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=18454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=18454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=18454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}