Collection skeletons: declarative abstractions for data collections
The University of Edinburgh
The University of Edinburgh, 2024
DOI:10.7488/era/4908
@article{li2024collection,
title={Collection skeletons: declarative abstractions for data collections},
author={Li, Zhibo},
year={2024},
publisher={The University of Edinburgh}
}
Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g., Containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. Typically, these collections frameworks are organised as hierarchies that provide programmers with common abstract data types (ADTs) like lists, queues, and stacks. While convenient, this approach introduces problems which ultimately affect application performance due to users overspecifying collection data types limiting implementation flexibility. On the other hand, with the development of parallel computing, there are increasingly parallel architectures that provide parallel speedup for diverse applications such as big data and artificial intelligence, however, there is still a substantial gap between the parallel hardware and writing parallel code. Additionally, the overspecification issue makes it harder for programmers, and particularly non-professional programmers to develop parallel programs. This work develops Collection Skeletons which provide a novel, declarative approach to data collections. Using our framework, programmers explicitly select properties for their collections, thereby truly decoupling specification from implementation. By making collection properties explicit, immediate benefits materialise in the forms of reduced risk of overspecification and increased implementation flexibility. A prototype library of the declarative abstractions for collections is presented in this thesis and shows that benchmark applications rewritten to use Collection Skeletons incur little or no overhead for sequential circumstances. In fact, performance improvements have been reported across most of the 17 benchmarks resulting from the use of Collection Skeletons before trying to parallelize those benchmarks, while also enhancing performance portability across three different hardware platforms. Additionally, by extending the Collection Skeletons towards parallelization, this work shows that Collection Skeletons help shielding the application developer from parallel implementation details, either by encapsulating implicit parallelism or through explicit properties that capture the requirements of parallel algorithmic skeletons. The Collection Skeletons help the programmers write better (parallel) code by providing a property-based declaration for data collections while creating parallel opportunities for library designers who can transparently improve the overall program performance without requiring much effort from the programmers.
September 22, 2024 by hgpu