high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Justin Richardson, Alan D. George, Herman Lam

NSF Center for High-Performance Reconfigurable Computing (CHREC), ECE Department, University of Florida, Gainesville, FL, USA

The Symposium on Application Accelerators in High-Performance Computing (SAAHPC), at Argonne National Laboratory, Lemont, IL, 2012

BibTeX

Download (PDF)

View

Source

2100

views

With the rising number of application accelerators, developers are looking for ways to evaluate new and competing platforms quickly, fairly, and early in the development cycle. As high-performance computing (HPC) applications increase their demands on application acceleration platforms, graphics processing units (GPUs) provide a potential solution for many developers looking for increased performance. Device performance metrics, such as Computational Density (CD), provide a useful but limited starting point for device comparison. The authors developed the Realizable Utilization (RU) metric and methodology to quantify the discrepancy between theoretical device performance shown by CD and the performance developers can achieve. As the RU score increases, the application is achieving a larger percentage of the computational power the device can provide. The authors survey technical publications about GPUs and use this data to analyze the RU scores for several arithmetic application kernels that are frequently accelerated in GPUs. The RU concepts presented in this paper are a first step towards a formalized comparison framework for diverse devices such as CPUs, FPGAs, GPUs and other novel architectures. GPU kernels for matrix multiplication, matrix decomposition, and N-body simulations show RU scores ranging from almost 0% to approaching 99% depending on the application, but all kernel areas show a significant decrease in RU as the computational capacities increase. Additionally, the RU scores show the higher realized performance of the GeForce 8 Series GPUs versus newer GPU architectures. This paper shows that applications running on GPUs with higher computational density report significantly lower RU scores than more mature GPUs with lower computational density. This trend implies that while the raw performance available is still increasing with newer GPUs, the achieved performance is not keeping pace with the theoretical capacities of the devices.

Tags: Computer science, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GT 240, nVidia GeForce GTX 285, nVidia GeForce GTX 480, Performance, Tesla C1060, Tesla C870

August 3, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance Analysis of GPU Accelerators with Realizable Utilization of Computational Density

Share this:

Recent source codes

Most viewed papers (last 30 days)