Multi-GPU implementation of a VMAT treatment plan optimization algorithm
Department of Radiation Oncology, University of Texas, Southwestern Medical Center, Dallas, TX 75390
arXiv:1503.01721 [physics.med-ph], (5 Mar 2015)
@article{tian2015multigpu,
title={Multi-GPU implementation of a VMAT treatment plan optimization algorithm},
author={Tian, Zhen and Peng, Fei and Folkerts, Michael and Tan, Jun and Jia, Xun and Jiang, Steve B.},
year={2015},
month={mar},
archivePrefix={"arXiv"},
primaryClass={physics.med-ph}
}
VMAT optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units have been used to speed up the computations. However, its small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix. This paper is to report an implementation of our column-generation based VMAT algorithm on a multi-GPU platform to solve the memory limitation problem. The column-generation approach generates apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. The DDC matrix is split into four sub-matrices according to beam angles, stored on four GPUs in compressed sparse row format. Computation of beamlet price is accomplished using multi-GPU. While the remaining steps of PP and MP problems are implemented on a single GPU due to their modest computational loads. A H&N patient case was used to validate our method. We compare our multi-GPU implementation with three single GPU implementation strategies: truncating DDC matrix (S1), repeatedly transferring DDC matrix between CPU and GPU (S2), and porting computations involving DDC matrix to CPU (S3). Two more H&N patient cases and three prostate cases were also used to demonstrate the advantages of our method. Our multi-GPU implementation can finish the optimization within ~1 minute for the H&N patient case. S1 leads to an inferior plan quality although its total time was 10 seconds shorter than the multi-GPU implementation. S2 and S3 yield same plan quality as the multi-GPU implementation but take ~4 minutes and ~6 minutes, respectively. High computational efficiency was consistently achieved for the other 5 cases. The results demonstrate that the multi-GPU implementation can handle the large-scale VMAT optimization problem efficiently without sacrificing plan quality.
March 6, 2015 by hgpu