high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Ahmad Abdelfattah, Jack Dongarra, David Keyes, Hatem Ltaief

KAUST Division of Mathematical and Computer Sciences and Engineering, Thuwal, Saudi Arabia

10th International Meeting on High-Performance Computing for Computational Science (VECPAR 2012), 2012

@article{abdelfattah2012optimizing,

title={Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators},

author={Abdelfattah, Ahmad and Dongarra, Jack and Keyes, David and Ltaief, Hatem},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

Magma v.1.2.1

2157

views

Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical kernel for computing the symmetric matrix-vector product on nVidia Fermi GPUs. Due to its inherent memory-bound nature, this kernel is very critical in the tridiagonalization of a symmetric dense matrix, which is a preprocessing step to calculate the eigenpairs. Using a novel design to address the irregular memory accesses by hiding latency and increasing bandwidth, our preliminary asymptotic results show 3.5x and 2.5x fold speedups over the similar CUBLAS 4.0 kernel, and 7-8% and 30% fold improvement over the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library in single and double precision arithmetics, respectively.

Tags: Computer science, CUBLAS, CUDA, nVidia, Optimization, Package, Tesla C2070

August 10, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Package:

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)