high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

G. Arbanas, M.E. Dunn, D. Wiarda

Oak Ridge National Laboratory, Oak Ridge, TN 37831-6171, U.S.A.

International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering (M&C 2011), 2011

@techreport{arbanas2011computation,

title={Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs},

author={Arbanas, G. and Dunn, M.E. and Wiarda, D.},

year={2011},

institution={Oak Ridge National Laboratory (ORNL)}

}

Download (PDF)

View

Source

2523

views

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The 235U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel’s Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms.

Tags: CUBLAS, CUDA, Heterogeneous systems, Linear Algebra, Matrix multiplication, Nuclear Experiment, nVidia, Physics, Tesla C2050

December 3, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Computation of Large Covariance Matrices by SAMMY on Graphical Processing Units and Multicore CPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)