15022
Lukasz Kaiser, Ilya Sutskever
Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential […]
View View   Download Download (PDF)   
Ziming Zhang, Yuting Chen, Venkatesh Saligrama
In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively "shallow" networks limited by the issues arising in back propagation (vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of […]
View View   Download Download (PDF)   
Linnan Wang, Wei Wu, Jianxiong Xiao, Yang Yi
This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix multiplication and integrate the algorithm with the ANN training. The experiments demonstrate that our matrix multiplication algorithm achieves linear speedup on multiple inhomogeneous […]
View View   Download Download (PDF)   
Patrick O. Glauner
This thesis describes the design and implementation of a smile detector based on deep convolutional neural networks. It starts with a summary of neural networks, the difficulties of training them and new training methods, such as Restricted Boltzmann Machines or autoencoders. It then provides a literature review of convolutional neural networks and recurrent neural networks. […]
View View   Download Download (PDF)   
Piotr Szwed, Wojciech Chmiel
This paper presents a multi-swarm PSO algorithm for the Quadratic Assignment Problem (QAP) implemented on OpenCL platform. Our work was motivated by results of time efficiency tests performed for single-swarm algorithm implementation that showed clearly that the benefits of a parallel execution platform can be fully exploited, if the processed population is large. The described […]
View View   Download Download (PDF)   
Kyuyeon Hwang, Wonyong Sung
Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose a […]
View View   Download Download (PDF)   
Ken Miura, Tatsuya Harada
Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing […]
Andrew Lavin
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures using a single kernel. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but […]
Mehdi Sajjadi, Mojtaba Seyedhosseini, Tolga Tasdizen
Artificial neural networks are powerful pattern classifiers; however, they have been surpassed in accuracy by methods such as support vector machines and random forests that are also easier to use and faster to train. Backpropagation, which is used to train artificial neural networks, suffers from the herd effect problem which leads to long training times […]
View View   Download Download (PDF)   
Pavel Hala
There is a great need for accurate and autonomous spectral classification methods in astrophysics. This thesis is about training a convolutional neural network (ConvNet) to recognize an object class (quasar, star or galaxy) from one-dimension spectra only. Author developed several scripts and C programs for datasets preparation, preprocessing and post-processing of the data. EBLearn library […]
View View   Download Download (PDF)   
Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, Yann LeCun
We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution implementations: one based on NVIDIA’s cuFFT library, and another based on a Facebook authored FFT implementation, fbfft, that provides significant speedups over cuFFT (over 1.5x) for whole CNNs. […]
Min Lin, Shuo Li, Xuan Luo, Shuicheng Yan
In this paper, we introduce a novel deep learning framework, termed Purine. In Purine, a deep network is expressed as a bipartite graph (bi-graph), which is composed of interconnected operators and data tensors. With the bi-graph abstraction, networks are easily solvable with event-driven task dispatcher. We then demonstrate that different parallelism schemes over GPUs and/or […]
View View   Download Download (PDF)   
Page 1 of 3123

* * *

* * *

Follow us on Twitter

HGPU group

1745 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: