Performance in GPU Architectures: Potentials and Distances

hgpu.org » Applications » Computer science » Performance in GPU Architectures: Potentials and Distances

Performance in GPU Architectures: Potentials and Distances

Ahmad Lashgar, Amirali Baniasadi

School of Electrical and Computer Engineering, College of Engineering, University of Tehran

9th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD), 2011

BibTeX

Download (PDF)

View

Source

1673

views

GPUs can execute up to one TFLOPs at their peak performance. This peak performance, however, is rarely reached as a result of resource underutilization. Three parameters contribute to this inefficiency: branch divergence, memory access delays and limited workload parallelism. To this end we suggest machine models to estimate performance gain potentials obtainable by eliminating each performance degrading parameter. Such estimates indicate how much improvement designers could expect by investing in different GPU subsections. Moreover, our models show how much performance is lost compared to an ideal GPU as a result of non-ideal GPU components. We conclude that memory is by far the most important parameter among the three issues impacting performance. We show that in the presence of an ideal memory system, GPU performance can reach within 59% of an ideal system. Meantime, using an ideal control-flow mechanism or unlimited execution resources does not come with the same impact. In fact, as we show in this study, an ideal control-flow could harm performance as the result of increasing pressure on the memory system. In addition, we study our models under GPUs exploiting aggressive memory systems and well-equipped Stream Multiprocessors. We investigate how previously suggested control-flow solutions impact performancedegrading issues and make recommendation to enhance control-flow mechanisms.

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia Quadro FX 5800, Performance

December 15, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Performance in GPU Architectures: Potentials and Distances

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance in GPU Architectures: Potentials and Distances

Share this:

Recent source codes

Most viewed papers (last 30 days)