{"id":7881,"date":"2012-07-11T15:00:34","date_gmt":"2012-07-11T12:00:34","guid":{"rendered":"http:\/\/hgpu.org\/?p=7881"},"modified":"2012-07-11T15:00:34","modified_gmt":"2012-07-11T12:00:34","slug":"hybrid-monte-carlo-with-wilson-dirac-operator-on-the-fermi-gpu","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=7881","title":{"rendered":"Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU"},"content":{"rendered":"<p>In this article we present our implementation of a Hybrid Monte Carlo algorithm for Lattice Gauge Theory using two degenerate flavours of Wilson-Dirac fermions on a Fermi GPU. We find that using registers instead of global memory speeds up the code by almost an order of magnitude. To map the array variables to scalars, so that the compiler puts them in the registers, we use code generators. Our final program is more than 10 times faster than a generic single CPU.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article we present our implementation of a Hybrid Monte Carlo algorithm for Lattice Gauge Theory using two degenerate flavours of Wilson-Dirac fermions on a Fermi GPU. We find that using registers instead of global memory speeds up the code by almost an order of magnitude. To map the array variables to scalars, so [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[89,3,12],"tags":[14,110,72,20,1783,378,1333],"class_list":["post-7881","post","type-post","status-publish","format-standard","hentry","category-nvidia-cuda","category-paper","category-physics","tag-cuda","tag-high-energy-physics-lattice","tag-monte-carlo-simulation","tag-nvidia","tag-physics","tag-tesla-c2050","tag-tesla-x2090"],"views":2544,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7881","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7881"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7881\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7881"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7881"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7881"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}