high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms

Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms

Rebecca L. Collins

School of Arts and Sciences, Columbia University

Columbia University, 2011

BibTeX

Download (PDF)

View

Source

1891

views

Multi-core platforms have spread to all corners of the computing industry, and trends in design and power indicate that the shift to multi-core will become even widerspread in the future. As the number of cores on a chip rises, the complexity of memory systems and on-chip interconnects increases drastically. The programmer inherits this complexity in the form of new responsibilities for task decomposition, synchronization, and data movement within an application, which hitherto have been concealed by complex processing pipelines or deemed unimportant since tasks were largely executed sequentially. To some extent, the need for explicit parallel programming is inevitable, due to limits in the instruction-level parallelism that can be automatically extracted from a program. However, these challenges create a great opportunity for the development of new programming abstractions which hide the low-level architectural complexity while exposing intuitive high-level mechanisms for expressing parallelism. Many models of parallel programming fall into the category of data-centric models, where the structure of an application depends on the role of data and communication in the relationships between tasks. The utilization of the inter-core communication networks and effective scaling to large data sets are decidedly important in designing efficient implementations of parallel applications. The questions of how many low-level architectural details should be exposed to the programmer, and how much parallelism in an application a programmer should expose to the compiler remain open-ended, with different answers depending on the architecture and the application in question. I propose that the key to unlocking the capabilities of multi-core platforms is the development of abstractions and optimizations which match the patterns of data movement in applications with the inter-core communication capabilities of the platforms. After a comparative analysis that confirms and stresses the importance of finding a good match between the programming abstraction, the application, and the architecture, this dissertation proposes two techniques that showcase the power of leveraging data dependency patterns in parallel performance optimizations. Flexible Filters dynamically balance load in stream programs by creating flexibility in the runtime data flow through the addition of redundant stream filters. This technique combines a static mapping with dynamic flow control to achieve light-weight, distributed and scalable throughput optimization. The properties of stream communication, i.e., FIFO pipes, enable flexible filters by exposing the backpressure dependencies between tasks. Next, I present Huckleberry, a novel recursive programming abstraction developed in order to allow programmers to expose data locality in divide-and-conquer algorithms at a high level of abstraction. Huckleberry automatically converts sequential recursive functions with explicit data partitioning into parallel implementations that can be ported across changes in the underlying architecture including the number of cores and the amount of on-chip memory. I then present a performance model for multicore applications which provides an efficient means to evaluate the trade-offs between the computational and communication requirements of applications together with the hardware resources of a target multi-core architecture. The model encompasses all data-driven abstractions that can be reduced to a task graph representation and is extensible to performance techniques such as Flexible Filters that alter an application’s original task graph. Flexible Filters and Huckleberry address the challenges of parallel programming on multi-core architectures by taking advantage of properties specific to the stream and recursive paradigms, and the performance model creates a unifying framework based on the communication between tasks in parallel applications. Combined, these contributions demonstrate that specialization with respect to communication patterns enhances the ability of parallel programming abstractions and optimizations to harvest the power of multi-core platforms.

Tags: Algorithms, Cell processor, Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, Optimization, Programming techniques, Thesis

October 17, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)