high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

L.A. Torres, Carlos J. Barrios H, Yves Denneulin

Supercomputación y Cálculo Científico (SC3UIS)

arXiv:2405.17322 [cs.DC], (27 May 2024)

DOI:10.48550/arXiv.2405.17322

@misc{torres2024evaluation,

title={Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL},

author={L. A. Torres and Carlos J. Barrios H and Yves Denneulin},

year={2024},

eprint={2405.17322},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

1963

views

Matrix multiplication is fundamental in the backpropagation algorithm used to train deep neural network models. Libraries like Intel’s MKL or NVIDIA’s cuBLAS implemented new and optimized matrix multiplication techniques that increase performance and reduce computational costs. These techniques can also be implemented in CUDA and SYCL and functions with AVX2 and AVX512 instructions, which have lower performance but better precision. The study compares execution times and power consumption using PAPI and PERF and compares accuracy for different matrix sizes. Comparisons were made on architectures such as third and fourth-generation Intel CPUs and NVIDIA V100 and A100 GPUs. The MKL library showed the best performance with a slight loss of precision, while OpenMP and SYCL on the CPU implementation showed the best accuracy but a loss of performance. On the other hand, the results on GPU showed that cuBLAS with tensor cores had the best performance; however, it had a cost in accuracy. The cuBLAS library without these specialized cores shows minimal performance loss and much higher accuracy. The data obtained on different architectures showed that the CPU could achieve performance close to that obtained on the GPU with increased power consumption. These results are conditional on certain hardware specifications, such as the number of cores, clock frequency, processor generation for the CPU, and the speed and bandwidth of the PCI bus and device architecture (compute capability) for the GPU.

Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia A100, Package, Performance, SYCL

June 2, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)