high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dynamic Memory Allocation for OpenCL

Dynamic Memory Allocation for OpenCL

Nadir Gamal Abdelrahim Salih

School of Informatics, University of Edinburgh

University of Edinburgh, 2014

BibTeX

Download (PDF)

View

Source

2959

views

Heterogeneous systems are computer systems that exploit multiple devices with different processor architectures to improve the computing efficiency by offloading workloads to the device that fits them best. OpenCL is a framework for building portable applications that run across different devices in heterogeneous systems. It has gained traction as a powerful tool for high-performance computing. However, it lacks dynamic memory allocation, which is one of the most useful features in modern programming languages and frameworks. Dynamic memory allocation is an important feature that provides the programmer with the ability to determine the memory requirements during run-time. This facilitates the implementation of applications that adapt to changing inputs and deal with irregular workloads that make the determination of memory requirements difficult before the program starts running. We attempt to implement a dynamic memory allocator in OpenCL that can provide this feature with minimal overhead while providing reliable and consistent allocations. Lock-free algorithms were used to build three allocators. The first is a naive allocator that does not allow for deallocation but provides maximum performance. The second isan allocator that uses segregated lists and can support deallocation but compromises with significant overhead. Finally, the third is an optimised two-level allocator that can provide decent performance, with suitable initialisation, and can support deallocation. We used lockfree algorithms because they tend to be more efficient in massively parallel systems such as GPUs where thousands of threads can be run, while complex synchronisation techniques such as locking tend to be inefficient or unfeasible. The optimised two-level allocator uses a technique borrowed from previous attempts at parallel dynamic memory allocation, namely Hoard, to divide the allocation process into a global, thread-safe, phase and a local phase. By testing these allocators we find that the naive allocator offers the best performance, but due to its lack of potential for deallocation the two-level allocator offers a better overall value. We argue that the two-level allocator has potential for use in real-world applications once deallocation is implemented. This is because, although the performance can be sluggish, it improves significantly when calls for the allocator are spaced out with other instructions in between. The performance also improves drastically using our initialisation technique that allocates initial local space in the host application.

Tags: ATI, ATI Radeon HD 7400 M, Computer science, Memory model, OpenCL, Thesis

August 31, 2015 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Dynamic Memory Allocation for OpenCL

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Dynamic Memory Allocation for OpenCL

Share this:

Recent source codes

Most viewed papers (last 30 days)