high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Synkhronos: a Multi-GPU Theano Extension for Data Parallelism

Synkhronos: a Multi-GPU Theano Extension for Data Parallelism

Adam Stooke, Pieter Abbeel

University of California, Berkeley

arXiv:1710.04162 [cs.DC], (11 Oct 2017)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Synkhronos: a Multi-GPU Theano Extension for Data Parallelism

5961

views

We present Synkhronos, an extension to Theano for multi-GPU computations leveraging data parallelism. Our framework provides automated execution and synchronization across devices, allowing users to continue to write serial programs without risk of race conditions. The NVIDIA Collective Communication Library is used for high-bandwidth inter-GPU communication. Further enhancements to the Theano function interface include input slicing (with aggregation) and input indexing, which perform common data-parallel computation patterns efficiently. One example use case is synchronous SGD, which has recently been shown to scale well for a growing set of deep learning problems. When training ResNet-50, we achieve a near-linear speedup of 7.5x on an NVIDIA DGX-1 using 8 GPUs, relative to Theano-only code running a single GPU in isolation. Yet Synkhronos remains general to any data-parallel computation programmable in Theano. By implementing parallelism at the level of individual Theano functions, our framework uniquely addresses a niche between manual multi-device programming and prescribed multi-GPU training routines.

Tags: Computer science, CUDA, Data parallelism, Deep learning, nVidia, nVidia DGX-1, Package, Tesla P100, Theano

October 15, 2017 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Synkhronos: a Multi-GPU Theano Extension for Data Parallelism

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Synkhronos: a Multi-GPU Theano Extension for Data Parallelism

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)