high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

G. Arbanas, M.E. Dunn, D. Wiarda

Oak Ridge National Laboratory, Oak Ridge, TN 37831-6171, U.S.A.

International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering (M&C 2011), 2011

@techreport{arbanas2011computation,

title={Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs},

author={Arbanas, G. and Dunn, M.E. and Wiarda, D.},

year={2011},

institution={Oak Ridge National Laboratory (ORNL)}

}

Download (PDF)

View

Source

1789

views

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The 235U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel’s Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms.

Tags: CUBLAS, CUDA, Heterogeneous systems, Linear Algebra, Matrix multiplication, Nuclear Experiment, nVidia, Physics, Tesla C2050

December 3, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)