{"id":3490,"date":"2011-04-07T20:39:07","date_gmt":"2011-04-07T20:39:07","guid":{"rendered":"http:\/\/hgpu.org\/?p=3490"},"modified":"2011-04-07T20:39:07","modified_gmt":"2011-04-07T20:39:07","slug":"using-gpu-to-accelerate-cache-simulation","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=3490","title":{"rendered":"Using GPU to Accelerate Cache Simulation"},"content":{"rendered":"<p>Caches play a major role in the performance of high speed computer systems. Trace driven simulator is the most widely used method to evaluate cache architectures. However, as the cache design moves to more complicated architectures, along with the size of the trace is becoming larger and larger. Traditional simulation methods are no longer practical due to their long simulation cycles. Several techniques have been proposed to reduce the simulation time of sequential trace driven simulation. This paper considers the use of generic GPU to accelerate cache simulation which exploits set partitioning as the main source of parallelism. We develop more efficient parallel simulation techniques by introducing more knowledge into the Compute Unified Device Architecture (CUDA) on the GPU. Our experimental result shows that the new algorithm gains 2.76 times performance improvement compared to traditional CPU based sequential algorithm.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Caches play a major role in the performance of high speed computer systems. Trace driven simulator is the most widely used method to evaluate cache architectures. However, as the cache design moves to more complicated architectures, along with the size of the trace is becoming larger and larger. Traditional simulation methods are no longer practical [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,11,89,3],"tags":[1787,1782,14,884,20,298],"class_list":["post-3490","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-computer-science","category-nvidia-cuda","category-paper","tag-algorithms","tag-computer-science","tag-cuda","tag-memory","tag-nvidia","tag-optimization"],"views":1946,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3490","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3490"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3490\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3490"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3490"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3490"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}