{"id":13776,"date":"2015-03-23T20:32:42","date_gmt":"2015-03-23T18:32:42","guid":{"rendered":"http:\/\/hgpu.org\/?p=13776"},"modified":"2015-03-23T20:32:42","modified_gmt":"2015-03-23T18:32:42","slug":"gpu-kernels-for-high-speed-4-bit-astrophysical-data-processing","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=13776","title":{"rendered":"GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing"},"content":{"rendered":"<p>Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit width of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With this data and calculation packing scheme, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. The kernel design allows correlations to scale to large numbers of input elements, limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[96,90,3],"tags":[1723,1794,7,97,1793,176],"class_list":["post-13776","post","type-post","status-publish","format-standard","hentry","category-astrophysics","category-opencl","category-paper","tag-amd-radeon-r9-280x","tag-astrophysics","tag-ati","tag-instrumentation-and-methods-for-astrophysics","tag-opencl","tag-package"],"views":2436,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/13776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13776"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/13776\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}