{"id":3081,"date":"2011-03-03T13:44:14","date_gmt":"2011-03-03T13:44:14","guid":{"rendered":"http:\/\/hgpu.org\/?p=3081"},"modified":"2011-03-03T13:44:14","modified_gmt":"2011-03-03T13:44:14","slug":"building-correlators-with-many-core-hardware","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=3081","title":{"rendered":"Building Correlators with Many-Core Hardware"},"content":{"rendered":"<p>Radio telescopes typically consist of multiple receivers whose signals are cross-correlated to filter out noise. A recent trend is to correlate in software instead of custom-built hardware, taking advantage of the flexibility that software solutions offer. Examples include e-VLBI and LOFAR. However, the data rates are usually high and the processing requirements challenging. Many-core processors are promising devices to provide the required processing power. In this paper, we explain how to implement and optimize signal-processing applications on multi-core CPUs and manycore architectures, such as the Intel Core i7, NVIDIA and ATI GPUs, and the Cell\/B.E. We use correlation as a running example. The correlator is a streaming, possibly real-time application, and is much more I\/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene\/P supercomputer. We discuss several important architectural problems which cause architectures to perform suboptimally, and also deal with programmability. The correlator on the Blue Gene\/P achieves a superb 96% of the theoretical peak performance. We show that the processing power and memory bandwidth of current GPUs are highly imbalanced. Because of this, the correlator achieves only 16% of the peak on ATI GPUs, and 32% on NVIDIA GPUs. The Cell\/B.E. processor, in contrast, achieves an excellent 92%. Many of the insights we discuss here are not only applicable to telescope correlators, are valuable when developing signal-processing applications in general.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Radio telescopes typically consist of multiple receivers whose signals are cross-correlated to filter out noise. A recent trend is to correlate in software instead of custom-built hardware, taking advantage of the flexibility that software solutions offer. Examples include e-VLBI and LOFAR. However, the data rates are usually high and the processing requirements challenging. Many-core processors [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[96,88,89,3,12,41],"tags":[1794,7,380,84,255,1792,545,14,20,1783,1789,199],"class_list":["post-3081","post","type-post","status-publish","format-standard","hentry","category-astrophysics","category-ati-stream","category-nvidia-cuda","category-paper","category-physics","category-signal-processing","tag-astrophysics","tag-ati","tag-ati-cal","tag-ati-il","tag-ati-radeon-hd-4870","tag-ati-stream","tag-cell-processor","tag-cuda","tag-nvidia","tag-physics","tag-signal-processing","tag-tesla-c1060"],"views":2132,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3081","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3081"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3081\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3081"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3081"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3081"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}