{"id":19251,"date":"2019-12-29T15:36:16","date_gmt":"2019-12-29T13:36:16","guid":{"rendered":"https:\/\/hgpu.org\/?p=19251"},"modified":"2019-12-29T15:36:16","modified_gmt":"2019-12-29T13:36:16","slug":"accelerating-molecular-docking-by-parallelized-heterogeneous-computing-a-case-study-of-performance-quality-of-results-and-energy-efficiency-using-cpus-gpus-and-fpgas","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=19251","title":{"rendered":"Accelerating Molecular Docking by Parallelized Heterogeneous Computing &#8211; A Case Study of Performance, Quality of Results, and Energy-Efficiency using CPUs, GPUs, and FPGAs"},"content":{"rendered":"<p>Molecular Docking (MD) is a key tool in computer-aided drug design that aims to predict the binding pose between a small molecule and a macromolecular target. At its core, MD calculates the strength of possible binding poses, and searches for the energetically-stronger ones among those generated during simulation. Automatic Docking (AutoDock) is a widely-used MD code that employs a physics-based scoring function to quantify the binding strength. AutoDock also uses a Lamarckian Genetic Algorithm (LGA), and in turn, the Solis-Wets method, as a local-search algorithm, in order to find strong interactions of such molecular systems. Due to the highly-parallel nature of the LGA tasks involved, AutoDock can benefit from runtime acceleration based on parallelization. This thesis presents an OpenCL-based parallelization of AutoDock, and a corresponding evaluation in terms of execution performance, quality-of-results, and compute-energy efficiency, achieved on different platforms based on: multi-core Central Processing Unit (CPU)s, Graphics Processing Unit (GPU)s, and Field Programmable Gate Array (FPGA)s. While a data-parallel approach has proven its effectiveness in accelerating AutoDock on CPUs and GPUs, it was observed that for FPGAs, such approach resulted in slower executions in the range of three-orders of magnitude when compared against the original single-threaded AutoDock. To overcome this drawback, a task-parallel implementation for FPGAs is discussed as well. Besides presenting an AutoDock implementation being parallelized using OpenCL, this thesis also extends the LGA search with new alternative local-search methods based on gradients (of the scoring function) such as: Steepest-Descent, FIRE, and ADADELTA. Among these, it was found that ADADELTA provides significant algorithmic benefits over Solis-Wets, yielding a reduction in calculation effort down to 1\/1300 of the legacy Solis-Wets method, while achieving equivalent quality-of-results. Compared to the original single-threaded AutoDock, the proposed data-parallel design achieves a speedup of up to ~399x and improves the compute-energy efficiency by up to ~297x when running on modern V100 GPUs. Furthermore, this thesis describes the adaptations performed on the proposed OpenCL-based implementation for supporting challenging real-world MD scenarios.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Molecular Docking (MD) is a key tool in computer-aided drug design that aims to predict the binding pose between a small molecule and a macromolecular target. At its core, MD calculates the strength of possible binding poses, and searches for the energetically-stronger ones among those generated during simulation. Automatic Docking (AutoDock) is a widely-used MD [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,90,3,12],"tags":[2040,7,1782,377,452,387,20,1793,176,1783,1963,390],"class_list":["post-19251","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","category-physics","tag-amd-radeon-rx-vega-56","tag-ati","tag-computer-science","tag-fpga","tag-heterogeneous-systems","tag-macromolecule","tag-nvidia","tag-opencl","tag-package","tag-physics","tag-tesla-v100","tag-thesis"],"views":2488,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/19251","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=19251"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/19251\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=19251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=19251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=19251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}