Parallel Memory Defragmentation on a GPU

hgpu.org » Programming » Algorithms » Parallel Memory Defragmentation on a GPU

Parallel Memory Defragmentation on a GPU

Ronald Veldema, Michael Philippsen

University of Erlangen-Nuremberg, Computer Science Department 2, Martensstr. 3, 91058 Erlangen, Germany

2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC ’12), 2012

DOI:10.1145/2247684.2247693

@inproceedings{Veldema:2012:PMD:2247684.2247693,

author={Veldema, Ronald and Philippsen, Michael},

title={Parallel memory defragmentation on a GPU},

booktitle={Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness},

series={MSPC ’12},

year={2012},

isbn={978-1-4503-1219-6},

location={Beijing, China},

pages={38–47},

numpages={10},

url={http://doi.acm.org/10.1145/2247684.2247693},

doi={10.1145/2247684.2247693},

acmid={2247693},

publisher={ACM},

address={New York, NY, USA},

keywords={GPU, garbage collection, mark and sweep, parallel}

}

Download (PDF)

View

Source

2765

views

High-throughput memory management techniques such as malloc/free or mark-and-sweep collectors often exhibit memory fragmentation leaving allocated objects interspersed with free memory holes. Memory defragmentation removes such holes by moving objects around in memory so that they become adjacent (compaction) and holes can be merged (coalesced) to form larger holes. However, known defragmentation techniques are slow. This paper presents a parallel solution to best-effort partial defragmentation that makes use of all available cores. The solution not only speeds up defragmentation times significantly, but it also scales for many simple cores. It can therefore even be implemented on a GPU. One problem with compaction is that it requires all references to moved objects to be retargeted to point to their new locations. This paper further improves existing work by a better identification of the parts of the heap that contain references to objects moved by the compactor and only processes these parts to find the references that are then retargeted in parallel. To demonstrate the performance of the new memory defragmentation algorithm on many-core processors, we show its performance on a modern GPU. Parallelization speeds up compaction 40 times and coalescing up to 32 times. After compaction, our algorithm only needs to process 2%–4% of the total heap to retarget references.

Tags: Algorithms, Computer science, CUDA, Memory model, nVidia, nVidia GeForce GTX 560 Ti, Performance, Storage system

July 6, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org