{"id":9075,"date":"2013-03-26T03:24:53","date_gmt":"2013-03-26T01:24:53","guid":{"rendered":"http:\/\/hgpu.org\/?p=9075"},"modified":"2013-03-26T03:24:53","modified_gmt":"2013-03-26T01:24:53","slug":"experimental-fault-tolerant-synchronization-for-reliable-computation-on-graphics-processors","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=9075","title":{"rendered":"Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors"},"content":{"rendered":"<p>Graphics processors (GPUs) are emerging as a promising platform for highly parallel, compute-intensive, general-purpose computations, which usually need support for inter-process synchronization. Using the traditional lock-based synchronization (e.g. mutual exclusion) makes the computation vulnerable to faults caused by both scientists&#8217; inexperience and hardware transient errors. It is notoriously difficult for scientists to deal with deadlocks when their computation needs to lock many objects concurrently. Hardware transient errors may make a process, which is holding a lock, stop progressing (or crash). While such hardware transient errors are a non-issue for graphics processors used by graphics computation (e.g. an error in a single pixel may not be noticeable), this no longer holds for graphics processors used for scientific computation. Such scientific computation requires a fault-tolerant synchronization mechanism. However, most of the powerful GPUs aimed at high-performance computing (e.g. NVIDIA Tesla series) do not support any strong synchronization primitives like test-andset and compare-and-swap, which are usually used to construct fault-tolerant synchronization mechanisms. This paper presents an experimental study of fault-tolerant synchronization mechanisms for NVIDIA&#8217;s Compute Unified Device Architecture (CUDA) without the need of strong synchronization primitives in hardware. We implement a lockfree synchronization mechanism that eliminates lock-related problems like the deadlock and, moreover, can tolerate process crash-failure.We address the experimental issues that arise in the implementation of the mechanism and evaluate its performance on commodity NVIDIA GeForce 8800 graphics cards.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Graphics processors (GPUs) are emerging as a promising platform for highly parallel, compute-intensive, general-purpose computations, which usually need support for inter-process synchronization. Using the traditional lock-based synchronization (e.g. mutual exclusion) makes the computation vulnerable to faults caused by both scientists&#8217; inexperience and hardware transient errors. It is notoriously difficult for scientists to deal with deadlocks [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,1189,20,357],"class_list":["post-9075","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-fault-tolerance","tag-nvidia","tag-nvidia-geforce-8800-gts"],"views":2310,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/9075","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9075"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/9075\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9075"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9075"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9075"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}