13888
Firas Abuzaid, Stefan Hadjis, Ce Zhang, Christopher Re
We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve up to one […]
View View   Download Download (PDF)   
Keven (Kedao) Wang
This project classifies images in Tiny ImageNet Challenge, a dataset with 200 classes and 500 training examples for each class. Three network architectures are experimented: a traditional architecture with 4 convolutional layers + 2 fully-connected layers; a Tiny GoogleNet with 3 inception layers; and a pyramid representation-based network. Tiny GoogleNet achieved the highest top-1 validation […]
View View   Download Download (PDF)   
Aanchan Mohan, Richard Rose
Multi-task learning (MTL) for deep neural network (DNN) multilingual acoustic models has been shown to be effective for learning parameters that are common or shared between multiple languages[1, 2]. In the MTL paradigm, the number of parameters in the output layer is large and scales with the number of languages used in training. This output […]
View View   Download Download (PDF)   
Dayong Wang, Anil K. Jain
Face retrieval is an enabling technology for many applications, including automatic face annotation, deduplication, and surveillance. In this paper, we propose a face retrieval system which combines a k-NN search procedure with a COTS matcher (PittPatt) in a cascaded manner. In particular, given a query face, we first pre-filter the gallery set and find the […]
View View   Download Download (PDF)   
Kyuyeon Hwang, Wonyong Sung
Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose a […]
View View   Download Download (PDF)   
Ken Miura, Tatsuya Harada
Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing […]
Zheng Yi Wu, Mahmoud Elmaghraby
Artificial neural network (ANN) is widely applied as the data-driven modeling tool in hydroinformatics due to its broad applicability of handling implicit and nonlinear relationships between the input and output data. To obtain a reliable ANN model, training ANN using the data is essential, but the training is usually taking many hours for a large […]
View View   Download Download (PDF)   
Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sebastien Jean, Pierre Froumenty, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio
The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which […]
View View   Download Download (PDF)   
Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, Derek Chiou
Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and power estimation model […]
View View   Download Download (PDF)   
Dashiell Bodington, Eric Greenstein, Matthew Hu
This paper investigates traffic sign classification, which is an important problem to solve for autonomous driving. Linear discriminant analysis and convolutional neural networks achieved an accuracy of 98.25% and 98.75% respectively when classifying eight different types of traffic signs. The CNN was implemented on a GPU for real-time traffic sign classification: testing time for the […]
View View   Download Download (PDF)   
Grady Williams, Eric Rombokas, Tom Daniel
We present an algorithm which combines recent advances in model based path integral control with machine learning approaches to learning forward dynamics models. We take advantage of the parallel computing power of a GPU to quickly take a massive number of samples from a learned probabilistic dynamics model, which we use to approximate the path […]
View View   Download Download (PDF)   
Ken Miura, Tetsuaki Mano, Atsushi Kanehira, Yuichiro Tsuchiya, Tatsuya Harada
MILJS is a collection of state-of-the-art, platform-independent, scalable, fast JavaScript libraries for matrix calculation and machine learning. Our core library offering a matrix calculation is called Sushi, which exhibits far better performance than any other leading machine learning libraries written in JavaScript. Especially, our matrix multiplication is 177 times faster than the fastest JavaScript benchmark. […]
Page 1 of 1212345...10...Last »

* * *

* * *

Like us on Facebook

HGPU group

238 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1444 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: