high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance models for CUDA streams on NVIDIA GeForce series

Performance models for CUDA streams on NVIDIA GeForce series

Juan Gomez-Luna, Jose Maria Gonzalez-Linares, Jose Ignacio Benavides, Nicolas Guil

Dept. of Computer Architecture and Electronics, University of Cordoba

University of Cordoba, 2012

@article{gomez2012performance,

title={Performance models for CUDA streams on NVIDIA GeForce series},

author={G{‘o}mez-Luna, J. and Gonz{‘a}lez-Linares, J.M. and Benavides, J.I. and Guil, N.},

year={2012}

}

Download (PDF)

View

Source

2557

views

Graphics Processing Units (GPU) have impressively arisen as generalpurpose coprocessors in high performance computing applications, since the launch of the Compute Unified Device Architecture (CUDA). However, they present an inherent performance bottleneck in the fact that communication between two separate address spaces (the main memory of the CPU and the memory of the GPU) is unavoidable. CUDA Application Programming Interface (API) provides asynchronous transfers and streams, which permit a staged execution, as a way to overlap communication and computation. Nevertheless, it does not exist a precise manner to estimate the possible improvement due to overlapping, neither a rule to determine the optimal number of stages or streams in which computation should be divided. In this work, we present a methodology that is applied to model the performance of asynchronous data transfers of CUDA streams on different GPU architectures. Thus, we illustrate this methodology by deriving expressions of performance for two different consumer graphic architectures belonging to the more recent generations. These models permit to estimate the optimal number of streams in which the computation on the GPU should be broken up, in order to obtain the highest performance improvements. Finally, we have successfully checked the suitability of our performance models on several NVIDIA devices belonging to GeForce 8, 9, 200, 400 and 500 series.

Tags: Computer science, CUDA, nVidia, nVidia GeForce 8800 GTS, nVidia GeForce 9800 GX2, nVidia GeForce GTX 260, nVidia GeForce GTX 280, nVidia GeForce GTX 480, nVidia GeForce GTX 580, Performance

July 9, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Performance models for CUDA streams on NVIDIA GeForce series

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Performance models for CUDA streams on NVIDIA GeForce series

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)