high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » BLAS Comparison on FPGA, CPU and GPU

BLAS Comparison on FPGA, CPU and GPU

Srinidhi Kestur, John D. Davis, Oliver Williams

Dept. of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802

In Proceedings of the 2010 IEEE Annual Symposium on VLSI (2010), pp. 288-293.

DOI:10.1109/ISVLSI.2010.84

@conference{kestur2010blas,

title={BLAS Comparison on FPGA, CPU and GPU},

author={Kestur, S. and Davis, J.D. and Williams, O.},

booktitle={Proceedings of the 2010 IEEE Annual Symposium on VLSI},

pages={288–293},

year={2010},

organization={IEEE Computer Society}

}

Download (PDF)

View

Source

2445

views

High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot-product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.

Tags: BLAS, Computer science, CUBLAS, CUDA, Energy-efficient computing, FPGA, Linear Algebra, nVidia, nVidia GeForce 9500 GT, Performance, Tesla C1060

December 6, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

BLAS Comparison on FPGA, CPU and GPU

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

BLAS Comparison on FPGA, CPU and GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)