high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dynamic Memory Allocation for OpenCL

Dynamic Memory Allocation for OpenCL

Nadir Gamal Abdelrahim Salih

School of Informatics, University of Edinburgh

University of Edinburgh, 2014

@article{salih2014dynamic,

title={Dynamic Memory Allocation for OpenCL},

author={Salih, Nadir Gamal Abdelrahim},

year={2014}

}

Download (PDF)

View

Source

2632

views

Heterogeneous systems are computer systems that exploit multiple devices with different processor architectures to improve the computing efficiency by offloading workloads to the device that fits them best. OpenCL is a framework for building portable applications that run across different devices in heterogeneous systems. It has gained traction as a powerful tool for high-performance computing. However, it lacks dynamic memory allocation, which is one of the most useful features in modern programming languages and frameworks. Dynamic memory allocation is an important feature that provides the programmer with the ability to determine the memory requirements during run-time. This facilitates the implementation of applications that adapt to changing inputs and deal with irregular workloads that make the determination of memory requirements difficult before the program starts running. We attempt to implement a dynamic memory allocator in OpenCL that can provide this feature with minimal overhead while providing reliable and consistent allocations. Lock-free algorithms were used to build three allocators. The first is a naive allocator that does not allow for deallocation but provides maximum performance. The second isan allocator that uses segregated lists and can support deallocation but compromises with significant overhead. Finally, the third is an optimised two-level allocator that can provide decent performance, with suitable initialisation, and can support deallocation. We used lockfree algorithms because they tend to be more efficient in massively parallel systems such as GPUs where thousands of threads can be run, while complex synchronisation techniques such as locking tend to be inefficient or unfeasible. The optimised two-level allocator uses a technique borrowed from previous attempts at parallel dynamic memory allocation, namely Hoard, to divide the allocation process into a global, thread-safe, phase and a local phase. By testing these allocators we find that the naive allocator offers the best performance, but due to its lack of potential for deallocation the two-level allocator offers a better overall value. We argue that the two-level allocator has potential for use in real-world applications once deallocation is implemented. This is because, although the performance can be sluggish, it improves significantly when calls for the allocator are spaced out with other instructions in between. The performance also improves drastically using our initialisation technique that allocates initial local space in the host application.

Tags: ATI, ATI Radeon HD 7400 M, Computer science, Memory model, OpenCL, Thesis

August 31, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Dynamic Memory Allocation for OpenCL

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Dynamic Memory Allocation for OpenCL

Share this:

Recent source codes

Most viewed papers (last 30 days)