14238
Anton Akusok, Kaj-Mikael Bjork, Yoan Miche, Amaury Lendasse
This work presents a complete approach to a successful utilization of a high performance Extreme Learning Machines (ELMs) Toolbox for Big Data. It summarizes recent advantages in algorithmic performance; gives a fresh view on the ELM solution in relation to the traditional linear algebraic performance; and reaps the latest software and hardware performance achievements. The […]
Genlang Chen, Chenggang Lai, Miaoqing Huang
Sparse coding has been a popular learning model in machine learning field. However, due to the complexity of the learning model, the high computational cost has seriously hindered its application. Toward this purpose, this paper presents a parallel sparse coding method to improve the performance by exploiting the power of acceleration technologies such as Intel […]
View View   Download Download (PDF)   
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S. Huang
Data embedding is used in many machine learning applications to create low-dimensional feature representations, which preserves the structure of data points in their original space. In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. Such networks are notoriously difficult to mine because of the bewildering combination […]
View View   Download Download (PDF)   
Karthik Narayan, Ali Punjani, Pieter Abbeel
Although recent work in non-linear dimensionality reduction investigates multiple choices of divergence measure during optimization (Yang et al., 2013; Bunte et al., 2012), little work discusses the direct effects that divergence measures have on visualization. We study this relationship, theoretically and through an empirical analysis over 10 datasets. Our works shows how the alpha and […]
View View   Download Download (PDF)   
Thomas Nelson, Axel Rivera, Prasanna Balaprakash, Mary Hall, Paul D. Hovland, Elizabeth Jessup, Boyana Norris
Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on […]
View View   Download Download (PDF)   
Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, P. Sadayappan
Sparse matrix-vector multiplication (SpMV) is a core kernel in numerous applications, ranging from physics simulation and large-scale solvers to data analytics. Many GPU implementations of SpMV have been proposed, targeting several sparse representations and aiming at maximizing overall performance. No single sparse matrix representation is uniformly superior, and the best performing representation varies for sparse […]
View View   Download Download (PDF)   
William F. Ogilvie, Pavlos Petoumenous, Zheng Wang, Hugh Leather
Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data, however, obtaining this data can take months per platform. This is becoming an ever more critical problem as the pace of […]
View View   Download Download (PDF)   
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus […]
Thomas L. Falch, Anne C. Elster
Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good […]
View View   Download Download (PDF)   
Bart van Merrienboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio
We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano’s symbolic computational graph, and providing an extensive set of utilities to […]
Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, Lingjia Tang
As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform […]
View View   Download Download (PDF)   
Fatemah Ramzy AlZayer
We optimize parameters in OpenACC clauses for a stencil evaluation kernel executed on Graphical Processing Units (GPUs) using a variety of machine learning and optimization search algorithms, individually and in hybrid combinations, and compare execution time performance to the best possible obtained from brute force search. Several auto-tuning techniques – historic learning, random walk, simulated […]
View View   Download Download (PDF)   
Page 1 of 1412345...10...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1497 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

254 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: