{"id":1021,"date":"2010-10-29T11:07:26","date_gmt":"2010-10-29T11:07:26","guid":{"rendered":"http:\/\/hgpu.org\/?p=1021"},"modified":"2010-10-29T11:07:26","modified_gmt":"2010-10-29T11:07:26","slug":"an-analytical-model-for-a-gpu-architecture-with-memory-level-and-thread-level-parallelism-awareness","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=1021","title":{"rendered":"An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness"},"content":{"rendered":"<p>GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more difficult. Current approaches rely on programmers to tune their applications by exploiting the design space exhaustively without fully understanding the performance characteristics of their applications.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more difficult. Current approaches rely on programmers [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[153,1782,14,222,20,226,183,224,223],"class_list":["post-1021","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-analytical-model","tag-computer-science","tag-cuda","tag-memory-level-parallelism","tag-nvidia","tag-nvidia-geforce-8800-gt","tag-nvidia-geforce-8800-gtx","tag-nvidia-quadro-fx-5600","tag-warp-level-parallelism"],"views":2916,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/1021","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1021"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/1021\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1021"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1021"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1021"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}