{"id":8190,"date":"2012-09-12T13:17:25","date_gmt":"2012-09-12T10:17:25","guid":{"rendered":"http:\/\/hgpu.org\/?p=8190"},"modified":"2012-09-12T13:17:25","modified_gmt":"2012-09-12T10:17:25","slug":"mapping-a-dataflow-programming-model-onto-heterogeneous-architectures","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=8190","title":{"rendered":"Mapping a Dataflow Programming Model onto Heterogeneous Architectures"},"content":{"rendered":"<p>This thesis describes and evaluates how extending Intel&#8217;s Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flow programming. The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing the single assignment rule, properties which ensure that problems are modelled in an intuitive way. CnC offers a separation of concerns by allowing algorithms to be expressed as a two stage process: first by decomposing a problem into components and specifying how components interact with each other, and second by providing an implementation for each component. By facilitating the separation between a domain expert, who can provide an accurate problem specification at a high level, and a tuning expert, who can tune the individual components for better performance, we ensure that tuning and future development, such as replacement of a subcomponent with a more efficient algorithm, become straightforward. A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. In addition, the use of FPGAs has seen a significant increase for applications that can take advantage of such dedicated hardware. We see that computing is evolving from using many core CPUs to &quot;co-processing&quot; on the CPU, GPU and FPGA, however hybrid programming models that support the interaction between multiple heterogeneous components are not widely accessible to mainstream programmers and domain experts who have a real need for such resources. We propose a C-based implementation of the CnC model for enabling parallelism across heterogeneous processor components in a flexible way, with high resource utilization and high programmability. We use the task-parallel HabaneroC language (HC) as the platform for implementing CnC-HabaneroC (CnC-HC), a language also used to implement the computation steps in CnC-HC, for interaction with GPU or FPGA steps and which offers the desired flexibility and extensibility of interacting with any other C based language. First, we extend the CnC model with tag functions and ranges to enable automatic code generation of high level operations for inter-task communication. This improves programmability and also makes the code more analysable, opening the door for future optimizations. Secondly, we introduce a way to specify steps that are data parallel and thus are fit to execute on the GPU, and the notion of task affinity, a tuning annotation in the specification language. Affinity is used by the runtime during scheduling and can be fine-tuned based on application needs to achieve better (faster, lower power, etc.) results. Thirdly, we introduce and develop a novel, data-driven runtime for the CnC model, using HabaneroC (HC) as a base language. In addition, we also create an implementation of the previous runtime approach and conduct a study to compare the performance. Next, we expand the HabaneroC dynamic work-stealing runtime to allow cross-device stealing based on task affinity. Cross-device dynamic work-stealing is used to achieve load balancing across heterogeneous platforms for improved performance. Finally, we implement and use a series of benchmarks for testing the model in different scenarios and show that our proposed approach can yield significant performance benefits and low power usage when using a hybrid execution.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This thesis describes and evaluates how extending Intel&#8217;s Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flow programming. The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[215,1782,14,452,20,1034,199,390],"class_list":["post-8190","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-code-generation","tag-computer-science","tag-cuda","tag-heterogeneous-systems","tag-nvidia","tag-nvidia-quadro-fx-580","tag-tesla-c1060","tag-thesis"],"views":2547,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8190","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8190"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8190\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8190"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8190"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8190"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}