high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A characterization and analysis of PTX kernels

A characterization and analysis of PTX kernels

Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332-0250, USA

IEEE International Symposium on Workload Characterization, 2009. IISWC 2009

DOI:10.1109/IISWC.2009.5306801

@article{kerr2009characterization,

title={A characterization and analysis of ptx kernels},

author={Kerr, A. and Diamos, G. and Yalamanchili, S.},

year={2009},

publisher={IEEE}

}

Download (PDF)

View

Source

3924

views

General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA’s CUDA, OpenCL, and Intel’s Ct. While significant effort has been focused on developing and evaluating applications and software tools, comparatively little has been devoted to the analysis and characterization of applications to assist future work in compiler optimizations, application re-structuring, and micro-architecture design. This paper proposes a set of metrics for GPU workloads and uses these metrics to analyze the behavior of GPU programs. We report on an analysis of over 50 kernels and applications including the full NVIDIA CUDA SDK and UIUC’s Parboil Benchmark Suite covering control flow, data flow, parallelism, and memory behavior. The analysis was performed using a full function emulator we developed that implements the NVIDIA virtual machine referred to as PTX (parallel thread execution architecture) – a machine model and low level virtual ISA that is representative of ISAs for data parallel execution. The emulator can execute compiled kernels from the CUDA compiler, currently supports the full PTX 1.4 specification, and has been validated against the full CUDA SDK. The results quantify the importance of optimizations such as those for branch reconvergence, the prevalance of sharing between threads, and highlights opportunities for additional parallelism.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 280, Performance, PTX

April 2, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A characterization and analysis of PTX kernels

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

A characterization and analysis of PTX kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)