GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

hgpu.org » Applications » Computer science » GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

Linnan Wang, Chenhan Yu, Satish Salian, Slawomir Kierat, Szymon Migacz, Alex Fit Florea

NVIDIA

arXiv:2205.00841 [cs.CV], (26 Apr 2022)

DOI:10.48550/arXiv.2205.00841

BibTeX

Download (PDF)

View

Source

955

views

Customizing Convolution Neural Networks (CNN) for production use has been a challenging task for DL practitioners. This paper intends to expedite the model customization with a model hub that contains the optimized models tiered by their inference latency using Neural Architecture Search (NAS). To achieve this goal, we build a distributed NAS system to search on a novel search space that consists of prominent factors to impact latency and accuracy. Since we target GPU, we name the NAS optimized models as GPUNet, which establishes a new SOTA Pareto frontier in inference latency and accuracy. Within 1ms, GPUNet is 2x faster than EfficientNet-X and FBNetV3 with even better accuracy. We also validate GPUNet on detection tasks, and GPUNet consistently outperforms EfficientNet-X and FBNetV3 on COCO detection tasks in both latency and accuracy. All of these data validate that our NAS system is effective and generic to handle different design tasks. With this NAS system, we expand GPUNet to cover a wide range of latency targets such that DL practitioners can deploy our models directly in different scenarios.

Tags: Computer science, CUDA, Deep learning, Neural networks, nVidia, nVidia Quadro GV100, Performance, Tesla A100

May 8, 2022 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)