high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Matthew W. Moskewicz, Ali Jannesari, Kurt Keutzer

University of California, Berkeley

arXiv:1611.06945 [cs.NE], (21 Nov 2016)

@article{moskewicz2016metaprogramming,

title={A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications},

author={Moskewicz, Matthew W. and Jannesari, Ali and Keutzer, Kurt},

year={2016},

month={nov},

archivePrefix={"arXiv"},

primaryClass={cs.NE}

}

Download (PDF)

View

Source

Source codes

Package:

Boda: A C++ Framework for Efficient Experiments in Computer Vision

2372

views

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, low productivity, and low portability. GPU vendors such as NVIDIA have spent enormous effort to write special-purpose DNN libraries. However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portable, open, high-performance, energy-efficient GPU code for DNN operations would enable broader deployment of DNN-based algorithms. Toward this end, this work presents a framework to enable productive, high-efficiency GPU programming for DNN computations across hardware platforms and programming models. In particular, the framework provides specific support for metaprogramming, autotuning, and DNN-tailored data types. Using our framework, we explore implementing DNN operations on three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA GPUs, we show both portability between OpenCL and CUDA as well competitive performance compared to the vendor library. On Qualcomm GPUs, we show that our framework enables productive development of target-specific optimizations, and achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial results that indicate our framework can yield reasonable performance on a new platform with minimal effort.

Tags: AMD R9 Nano, ARM, ATI, Code generation, Computer science, Computer vision, CUDA, Deep learning, Neural networks, nVidia, nVidia GeForce GTX Titan X, OpenCL, Package

November 23, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Package:

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)