14499

Dynamic Memory Allocation for OpenCL

Nadir Gamal Abdelrahim Salih
School of Informatics, University of Edinburgh
University of Edinburgh, 2014

@article{salih2014dynamic,

   title={Dynamic Memory Allocation for OpenCL},

   author={Salih, Nadir Gamal Abdelrahim},

   year={2014}

}

Download Download (PDF)   View View   Source Source   

2704

views

Heterogeneous systems are computer systems that exploit multiple devices with different processor architectures to improve the computing efficiency by offloading workloads to the device that fits them best. OpenCL is a framework for building portable applications that run across different devices in heterogeneous systems. It has gained traction as a powerful tool for high-performance computing. However, it lacks dynamic memory allocation, which is one of the most useful features in modern programming languages and frameworks. Dynamic memory allocation is an important feature that provides the programmer with the ability to determine the memory requirements during run-time. This facilitates the implementation of applications that adapt to changing inputs and deal with irregular workloads that make the determination of memory requirements difficult before the program starts running. We attempt to implement a dynamic memory allocator in OpenCL that can provide this feature with minimal overhead while providing reliable and consistent allocations. Lock-free algorithms were used to build three allocators. The first is a naive allocator that does not allow for deallocation but provides maximum performance. The second isan allocator that uses segregated lists and can support deallocation but compromises with significant overhead. Finally, the third is an optimised two-level allocator that can provide decent performance, with suitable initialisation, and can support deallocation. We used lockfree algorithms because they tend to be more efficient in massively parallel systems such as GPUs where thousands of threads can be run, while complex synchronisation techniques such as locking tend to be inefficient or unfeasible. The optimised two-level allocator uses a technique borrowed from previous attempts at parallel dynamic memory allocation, namely Hoard, to divide the allocation process into a global, thread-safe, phase and a local phase. By testing these allocators we find that the naive allocator offers the best performance, but due to its lack of potential for deallocation the two-level allocator offers a better overall value. We argue that the two-level allocator has potential for use in real-world applications once deallocation is implemented. This is because, although the performance can be sluggish, it improves significantly when calls for the allocator are spaced out with other instructions in between. The performance also improves drastically using our initialisation technique that allocates initial local space in the host application.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: