high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Lokendra Singh Panwar

Virginia Polytechnic Institute and State University

Virginia Polytechnic Institute and State University, 2014

@phdthesis{panwar2014performance,

title={Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures},

author={Panwar, Lokendra Singh},

year={2014},

school={Virginia Polytechnic Institute and State University}

}

Download (PDF)

View

Source

2582

views

Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection of best processor for an application, effective performance optimization strategies, maintaining performance portability across architectures etc. In this thesis, we present our techniques and approach to address some of these significant issues. Firstly, we present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases of this technique include scheduling or migrating GPU workloads over a heterogeneous cluster with different types of GPUs. We then present our approach to accelerate a seismology modeling application that is based on the finite difference method (FDM), using MPI and CUDA over a hybrid CPU+GPU cluster. We describe the generic computational complexities involved in porting such applications to the GPUs and present our strategy of efficient performance optimization and characterization. We also show how performance modeling can be used to reason and drive the hardware-specific optimizations on the GPU. The performance evaluation of our approach delivers a maximum speedup of 23-fold with a single GPU and 33-fold with dual GPUs per node over the serial version of the application, which in turn results in a many-fold speedup when coupled with the MPI distribution of the computation across the cluster. We also study the efficacy of GPU-integrated MPI, with MPI-ACC as an example implementation, in a seismology modeling application and discuss the lessons learned.

Tags: ATI, ATI Radeon HD 5870, ATI Radeon HD 7970, Computational Complexity, Computer science, CUDA, Finite difference, GPU cluster, Heterogeneous systems, MPI, nVidia, OpenCL, Seismology, Tesla C1060, Tesla C2050, Thesis

October 29, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)