high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Exascale Deep Learning for Scientific Inverse Problems

Exascale Deep Learning for Scientific Inverse Problems

Nouamane Laanait, Joshua Romero, Junqi Yin, M. Todd Young, Sean Treichler, Vitalii Starchenko, Albina Borisevich, Alex Sergeev, Michael Matheson

Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

arXiv:1909.11150 [cs.LG], (24 Sep 2019)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

horovod: Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet

1930

views

We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. These new techniques produce an optimal overlap between computation and communication and result in near-linear scaling (0.93) of distributed training up to 27,600 NVIDIA V100 GPUs on the Summit Supercomputer. We demonstrate our gradient reduction techniques in the context of training a Fully Convolutional Neural Network to approximate the solution of a longstanding scientific inverse problem in materials imaging. The efficient distributed training on a dataset size of 0.5 PB, produces a model capable of an atomically-accurate reconstruction of materials, and in the process reaching a peak performance of 2.15(4) EFLOPS16.

Tags: Computer science, Deep learning, Neural networks, nVidia, Package, Tesla V100

September 29, 2019 by hgpu

Rating: 2.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Exascale Deep Learning for Scientific Inverse Problems

Package:

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Exascale Deep Learning for Scientific Inverse Problems

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)