991

Accelerating Kirchhoff Migration by CPU and GPU Cooperation

Jairo Panetta, Thiago Teixeira, Paulo R. P. de Souza Filho, Carlos A. da Cunha Finho, David Sotelo, Fernando M. da Motta, Silvio S. Pinheiro, Ivan P. Junior, Andre L. Rosa, Luiz R. Monnerat, Leandro T. Carneiro, Carlos H. B. de Albrecht
Tecnologia Geofisica, Petroleo Brasileiro SA, PETROBRAS, Rio de Janeiro, Brazil
Computer Architecture and High Performance Computing, 2009. SBAC-PAD ’09. 21st International Symposium on In 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD ’09). (31 October 2009), pp. 26-32.

@conference{panetta2009accelerating,

   title={Accelerating Kirchhoff Migration by CPU and GPU Cooperation},

   author={Panetta, J. and Teixeira, T. and de Souza Filho, P.R.P. and da Cunha Finho, C.A. and Sotelo, D. and da Motta, F. and Pinheiro, S.S. and Pedrosa, I. and Rosa, A.L.R. and Monnerat, L.R. and others},

   booktitle={Computer Architecture and High Performance Computing, 2009. SBAC-PAD’09. 21st International Symposium on},

   pages={26–32},

   year={2009},

   organization={IEEE}

}

Download Download (PDF)   View View   Source Source   

780

views

We discuss the performance of Petrobras production Kirchhoff prestack seismic migration on a cluster of 64 GPUs and 256 CPU cores. Porting and optimization of the application hot spot (98.2% of a single CPU core execution time) to a single GPU reduces total execution time by a factor of 36 on a control run. We then argue against the usual practice of porting the next hot spot (1.5% of single CPU core execution time) to the GPU. Instead, we show that cooperation of CPU and GPU reduces total execution time by a factor of 59 on the same control run. Remaining GPU idle cycles are eliminated by overloading the GPU with multiple requests originated from distinct CPU cores. However, increasing the number of CPU cores in the computation reduces the gain due to the combination of enhanced parallelism in the runs without GPUs and GPU saturation on runs with GPUs. We proceed by obtaining close to perfect speed-up on the full cluster over homogeneous load obtained by replicating control run data. To cope with the heterogeneous load of real world data we show a dynamic load balancing scheme that reduces total execution time by a factor of 20 on runs that use all GPUs and half of the cluster CPU cores with respect to runs that use all CPU cores but no GPU.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: