13294

Posts

Dec, 20

Towards an automatic generation of dense linear algebra solvers on parallel architectures

The increasing complexity of new parallel architectures has widened the gap between adaptability and efficiency of the codes. As high performance numerical libraries tend to focus more on performance, we wish to address this issue using a C++ library called NT2. By analyzing the properties of the linear algebra domain that can be extracted from […]
Dec, 18

Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite

Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently scaling applications on future machines will be essential for improved science and it is recognised that the "flat" MPI model will start to reach its scalability limits. The optimal approach is unknown, necessitating the use of mini-applications to rapidly evaluate new approaches. […]
Dec, 18

Multicore Scheduling of Parallel Real-Time Tasks with Multiple Parallelization Options

Past researches on multicore scheduling assume that a computational unit has already been parallelized into a prefixed number of threads. However, with recent technologies such as OpenCL, a computational unit can be parallelized in many different ways with runtime selectable numbers of threads. This paper proposes an optimal algorithm for parallelizing and scheduling a set […]
Dec, 18

Efficient GPU Implementation for Single Block Orthogonal Dictionary Learning

Dictionary training for sparse representations involves dealing with large chunks of data and complex algorithms that determine time consuming implementations. SBO is an iterative dictionary learning algorithm based on constructing unions of orthonormal bases via singular value decomposition, that represents each data item through a single best fit orthobase. In this paper we present a […]
Dec, 18

GPU-Powered Coherent Beamforming

GPU-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimised for deployment at the BEST-2 array […]
Dec, 18

DeepSpeech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, […]
Dec, 16

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

We present highly efficient algorithms for performing forward and backward propagation of Convolutional Neural Network (CNN) for pixelwise classification on images. For pixelwise classification tasks, such as image segmentation and object detection, surrounding image patches are fed into CNN for predicting the classes of centered pixels via forward propagation and for updating CNN parameters via […]
Dec, 16

Multi-Centroid PSO Classification Learning on the GPU

Training classifiers can be seen as an optimization problem. With this view, we have developed a method to train a type of nearest centroid classifier with PSO. Results showed an improvement on most of the datasets tested. Additionally, we have developed a method to utilize the developed classifier with datasets containing both numeric and categorical […]
Dec, 16

An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel

With the advent of multi and many-core processors, communication has replaced computation as the performance bottleneck. Most current approaches to the problem try to tolerate memory access latency through a high amount of Thread-Level Parallelism. However, not all applications benefit from such techniques and there is a need to address the weakness of the underlying […]
Dec, 16

Scaling behavior of topologically constrained polymer rings in a melt

Large scale molecular dynamics simulations on graphic processing units (GPUs) are employed to study the scaling behavior of ring polymers with various topological constraints in melts. Typical sizes of rings containing $3_1$, $5_1$ knots and catenanes made up of two unknotted rings scale like $N^{1/3}$ in the limit of large ring sizes $N$. This is […]
Dec, 16

MatConvNet – Convolutional Neural Networks for MATLAB

MatConvNet is an implementation of Convolutional Neural Networks (CNNs) for MATLAB. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing linear convolutions with filter banks, feature pooling, and many more. In this manner, MatConvNet allows fast prototyping of […]
Dec, 15

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning

The tooling landscape of deep learning is fragmented by a growing gap between the generic and productivity-oriented tools that optimize for algorithm development and the task-specific ones that optimize for speed and scale. This creates an artificial barrier to bring new innovations into real-world applications. Minerva addresses this issue with a layered design that provides […]
Page 20 of 793« First...10...1819202122...304050...Last »

* * *

* * *

Like us on Facebook

HGPU group

230 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1425 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: