{"id":9587,"date":"2013-06-14T22:39:44","date_gmt":"2013-06-14T19:39:44","guid":{"rendered":"http:\/\/hgpu.org\/?p=9587"},"modified":"2013-06-14T22:39:44","modified_gmt":"2013-06-14T19:39:44","slug":"real-space-density-functional-theory-on-graphical-processing-units-computational-approach-and-comparison-to-gaussian-basis-set-methods","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=9587","title":{"rendered":"Real-space density functional theory on graphical processing units: computational approach and comparison to Gaussian basis set methods"},"content":{"rendered":"<p>We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation GPUs from AMD and Nvidia, which show that our scheme, implemented in the free code OCTOPUS, can reach a sustained performance of up to 90 GFlops for a single GPU, representing an important speed-up when compared to the CPU version of the code. Moreover, for some systems our implementation can outperform a GPU Gaussian basis set code, showing that the real-space approach is a competitive alternative for DFT simulations on GPUs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[90,3,12],"tags":[7,1307,98,196,263,20,1793,176,1783,1390],"class_list":["post-9587","post","type-post","status-publish","format-standard","hentry","category-opencl","category-paper","category-physics","tag-ati","tag-ati-radeon-hd-7970","tag-computational-physics","tag-condensed-matter","tag-data-parallelism","tag-nvidia","tag-opencl","tag-package","tag-physics","tag-tesla-k20"],"views":2511,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/9587","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9587"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/9587\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9587"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9587"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9587"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}