high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Weighted Residuals for Very Deep Networks

Weighted Residuals for Very Deep Networks

Falong Shen, Gang Zeng

Peking University

arXiv:1605.08831 [cs.CV], (28 May 2016)

BibTeX

Download (PDF)

View

Source

3385

views

Deep residual networks have recently shown appealing performance on many challenging computer vision tasks. However, the original residual structure still has some defects making it difficult to converge on very deep networks. In this paper, we introduce a weighted residual network to address the incompatibility between ReLU and element-wise addition and the deep network initialization problem. The weighted residual network is able to learn to combine residuals from different layers effectively and efficiently. The proposed models enjoy a consistent improvement over accuracy and convergence with increasing depths from 100+ layers to 1000+ layers. Besides, the weighted residual networks have little more computation and GPU memory burden than the original residual networks. The networks are optimized by projected stochastic gradient descent. Experiments on CIFAR-10 have shown that our algorithm has a faster convergence speed than the original residual networks and reaches a high accuracy at 95.3% with a 1192-layer model.

Tags: Algorithms, Computer science, Computer vision, CUDA, Deep learning, nVidia, nVidia GeForce GTX Titan X

June 2, 2016 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Weighted Residuals for Very Deep Networks

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Weighted Residuals for Very Deep Networks

Share this:

Recent source codes

Most viewed papers (last 30 days)