13390
Qi Lyu, Jun Zhu
Long Short-Term Memory (LSTM) is a deep recurrent neural network architecture with high computational complexity. Contrary to the standard practice to train LSTM online with stochastic gradient descent (SGD) methods, we propose a matrix-based batch learning method for LSTM with full Backpropagation Through Time (BPTT). We further solve the state drifting issues as well as […]
View View   Download Download (PDF)   
Xiangyu Li
MapReduce is a programming model capable of processing massive data in parallel across hundreds of computing nodes in a cluster. It hides many of the complicated details of parallel computing and provides a straightforward interface for programmers to adapt their algorithms to improve productivity. Many MapReduce-based applications have utilized the power of this model, including […]
View View   Download Download (PDF)   
Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang
We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can […]
Albert Haque
Cardiac dysrhythmia is responsible for over half a million deaths in the United States annually. In this work, we evaluate the performance of neural networks on classifying electrocardiogram (ECG) sequences as normal or abnormal (arrhythmia). Using neural networks as our primary learning model, we explain our model’s performance and discuss hyperparameter tuning. Comparing the results […]
Mehdi Sajjadi, Mojtaba Seyedhosseini, Tolga Tasdizen
Artificial neural networks are powerful pattern classifiers; however, they have been surpassed in accuracy by methods such as support vector machines and random forests that are also easier to use and faster to train. Backpropagation, which is used to train artificial neural networks, suffers from the herd effect problem which leads to long training times […]
View View   Download Download (PDF)   
Pavel Hala
There is a great need for accurate and autonomous spectral classification methods in astrophysics. This thesis is about training a convolutional neural network (ConvNet) to recognize an object class (quasar, star or galaxy) from one-dimension spectra only. Author developed several scripts and C programs for datasets preparation, preprocessing and post-processing of the data. EBLearn library […]
View View   Download Download (PDF)   
Tianyi David Han, Tarek S. Abdelrahman
The use of local memory is important to improve the performance of OpenCL programs. However, its use may not always benefit performance, depending on various application characteristics, and there is no simple heuristic for deciding when to use it. We develop a machine learning model to decide if the optimization is beneficial or not. We […]
View View   Download Download (PDF)   
Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, Yann LeCun
We examine the performance profile of Convolutional Neural Network training on the current generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier Transform convolution implementations: one based on NVIDIA’s cuFFT library, and another based on a Facebook authored FFT implementation, fbfft, that provides significant speedups over cuFFT (over 1.5x) for whole CNNs. […]
Min Lin, Shuo Li, Xuan Luo, Shuicheng Yan
In this paper, we introduce a novel deep learning framework, termed Purine. In Purine, a deep network is expressed as a bipartite graph (bi-graph), which is composed of interconnected operators and data tensors. With the bi-graph abstraction, networks are easily solvable with event-driven task dispatcher. We then demonstrate that different parallelism schemes over GPUs and/or […]
View View   Download Download (PDF)   
Hongsheng Li, Rui Zhao, Xiaogang Wang
We present highly efficient algorithms for performing forward and backward propagation of Convolutional Neural Network (CNN) for pixelwise classification on images. For pixelwise classification tasks, such as image segmentation and object detection, surrounding image patches are fed into CNN for predicting the classes of centered pixels via forward propagation and for updating CNN parameters via […]
View View   Download Download (PDF)   
Andrea Vedaldi, Karel Lenc
MatConvNet is an implementation of Convolutional Neural Networks (CNNs) for MATLAB. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing linear convolutions with filter banks, feature pooling, and many more. In this manner, MatConvNet allows fast prototyping of […]
View View   Download Download (PDF)   
Minjie Wang, Tianjun Xiao, Jianpeng Li, Jiaxing Zhang, Chuntao Hong, Zheng Zhang
The tooling landscape of deep learning is fragmented by a growing gap between the generic and productivity-oriented tools that optimize for algorithm development and the task-specific ones that optimize for speed and scale. This creates an artificial barrier to bring new innovations into real-world applications. Minerva addresses this issue with a layered design that provides […]
View View   Download Download (PDF)   
Page 1 of 1112345...10...Last »

* * *

* * *

Like us on Facebook

HGPU group

211 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1373 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: