{"id":8228,"date":"2012-09-18T15:11:30","date_gmt":"2012-09-18T12:11:30","guid":{"rendered":"http:\/\/hgpu.org\/?p=8228"},"modified":"2012-09-18T15:11:30","modified_gmt":"2012-09-18T12:11:30","slug":"the-architecture-and-evolution-of-cpu-gpu-systems-for-general-purpose-computing","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=8228","title":{"rendered":"The Architecture and Evolution of CPU-GPU Systems for General Purpose Computing"},"content":{"rendered":"<p>GPU computing has emerged in recent years as a viable execution platform for throughput oriented applications or regions of code. GPUs started out as independent units for program execution but there are clear trends towards tight-knit CPU-GPU integration. In this work, we will examine existing research directions and future opportunities for chip integrated CPU-GPU systems. We first seek to understand state of the art GPU architectures and examine GPU design proposals to reduce performance loss caused by SIMT thread divergence. Next, we motivate the need of new CPU design directions for CPU-GPU systems by discussing our work in the area. We examine proposals as to how shared components such as lastlevel caches and memory controllers could be evolved to improve the performance of CPU-GPU systems. We then look at collaborative CPUGPU execution schemes. Lastly, we discuss future work directions and research opportunities for CPU-GPU systems.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>GPU computing has emerged in recent years as a viable execution platform for throughput oriented applications or regions of code. GPUs started out as independent units for program execution but there are clear trends towards tight-knit CPU-GPU integration. In this work, we will examine existing research directions and future opportunities for chip integrated CPU-GPU systems. [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,3],"tags":[1782,452,31],"class_list":["post-8228","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","tag-computer-science","tag-heterogeneous-systems","tag-review"],"views":2700,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8228"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8228\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}