## Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Oak Ridge National Laboratory, Oak Ridge, TN 37831-6171, U.S.A.

International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering (M&C 2011), 2011

@techreport{arbanas2011computation,

title={Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs},

author={Arbanas, G. and Dunn, M.E. and Wiarda, D.},

year={2011},

institution={Oak Ridge National Laboratory (ORNL)}

}

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The 235U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel’s Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms.

December 3, 2011 by hgpu