high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » BLAS Comparison on FPGA, CPU and GPU

BLAS Comparison on FPGA, CPU and GPU

Srinidhi Kestur, John D. Davis, Oliver Williams

Dept. of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802

In Proceedings of the 2010 IEEE Annual Symposium on VLSI (2010), pp. 288-293.

DOI:10.1109/ISVLSI.2010.84

@conference{kestur2010blas,

title={BLAS Comparison on FPGA, CPU and GPU},

author={Kestur, S. and Davis, J.D. and Williams, O.},

booktitle={Proceedings of the 2010 IEEE Annual Symposium on VLSI},

pages={288–293},

year={2010},

organization={IEEE Computer Society}

}

Download (PDF)

View

Source

1963

views

High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have developed parameterized modular implementations for the dot-product and Gaxpy or matrix-vector multiplication. In order to obtain optimal performance for any aspect ratio of the matrices, we have designed a high-throughput accumulator to perform an efficient reduction of floating point values. To support scalability to large data-sets, we target the BEE3 FPGA platform. We use performance and energy efficiency as metrics to compare the different platforms. Results show that FPGAs offer comparable performance as well as 2.7 to 293 times better energy efficiency for the test cases that we implemented on all three platforms.

Tags: BLAS, Computer science, CUBLAS, CUDA, Energy-efficient computing, FPGA, Linear Algebra, nVidia, nVidia GeForce 9500 GT, Performance, Tesla C1060

December 6, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

BLAS Comparison on FPGA, CPU and GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

BLAS Comparison on FPGA, CPU and GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)