{"id":6838,"date":"2012-01-05T17:39:22","date_gmt":"2012-01-05T15:39:22","guid":{"rendered":"http:\/\/hgpu.org\/?p=6838"},"modified":"2012-01-05T17:39:22","modified_gmt":"2012-01-05T15:39:22","slug":"selecting-the-best-tridiagonal-system-solver-projected-on-multi-core-cpu-and-gpu-platforms","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=6838","title":{"rendered":"Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms"},"content":{"rendered":"<p>Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli&#8217;s algorithm, as a coarse-grained example of parallelism. Both algorithms are implemented for GPU architectures using CUDA and multi-core CPU with shared memory architectures using OpenMP. The results are compared in terms of execution time, speedup, and GFLOPS. For a large system of equations, 2^22, the best results were obtained for Bondeli&#8217;s algorithm (speedup 1.55x and 0.84 GFLOPS) for multi-core CPU platforms while the cyclic reduction (speedup 17.06x and 5.09 GFLOPS) was the best for the case of GPU platforms.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli&#8217;s algorithm, [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,11,89,3],"tags":[1787,1782,14,37,20,436,252],"class_list":["post-6838","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-nvidia-cuda","category-paper","tag-algorithms","tag-computer-science","tag-cuda","tag-linear-algebra","tag-nvidia","tag-nvidia-geforce-gtx-295","tag-openmp"],"views":2118,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/6838","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6838"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/6838\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6838"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6838"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6838"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}