Krzysztof Wolk, Krzysztof Marasek
The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely-available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on […]
View View   Download Download (PDF)   
Andrew Lavin
We derive a new class of fast algorithms for convolutional neural networks using Winograd’s minimal filtering algorithms. Specifically we derive algorithms for network layers with 3×3 kernels, which are the preferred kernel size for image recognition tasks. The best of our algorithms reduces arithmetic complexity up to 4X compared with direct convolution, while using small […]
View View   Download Download (PDF)   
Shengren Li, Nina Amenta
We present a brute-force approach for finding k-nearest neighbors on the GPU for many queries in parallel. Our program takes advantage of recent advances in fundamental GPU computing primitives. We modify a matrix multiplication subroutine in MAGMA library [6] to calculate the squared Euclidean distances between queries and references. The nearest neighbors selection is accomplished […]
View View   Download Download (PDF)   
Ang Li, Radu Serban, Dan Negrut
We discuss an approach for solving sparse or dense banded linear systems ${bf A} {bf x} = {bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${bf A} in {mathbb{R}}^{N times N}$ is possibly nonsymmetric and moderately large; i.e., $10000 leq N leq 500000$. The ${it split and parallelize}$ (${tt SaP}$) approach seeks […]
Mujeeb B Jimoh
Insider threat is one of the risks both government and private organizations have to deal with in protecting their important information. Data exfiltration and data leakage resulting from insiders activities can be very difficult to identify and quantify. Unfortunately, existing solutions that efficiently check whether data moving across a network is known to be sensitive […]
View View   Download Download (PDF)   
Shuoxin Lin
Dataflow models are valuable tools for representing, analyzing, and synthesizing embedded systems. Heterogeneous computing platforms with multi-core CPU and Graphics Processing Units (GPUs) provide a low cost platform for high performance computations. In this report, we present a dataflow based automated design framework that incorporates analysis, optimization and synthesis tools for embedded systems. Our framework […]
View View   Download Download (PDF)   
Erik Boss
The intractability of the ECDLP is part of what makes many cryptographic application work. As such, viewing this problem from as many angles as possible is worthwhile. In this thesis, we explore the angle of creating a GPU ECDLP solver using OpenCL. In the process, we discuss the many issues, limitations and solutions we encounter. […]
View View   Download Download (PDF)   
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kim, Hadi Esmaeilzadeh
A growing number of commercial and enterprise systems increasingly rely on compute-intensive machine learning algorithms. While the demand for these compute-intensive applications is growing, the performance benefits from general-purpose platforms are diminishing. To accommodate the needs of machine learning algorithms, Field Programmable Gate Arrays (FPGAs) provide a promising path forward and represent an intermediate point […]
View View   Download Download (PDF)   
John-Alexander M. Assael
Data-efficient learning in continuous state-action spaces using high-dimensional observations remains an elusive challenge in developing fully autonomous systems. An instance of this challenge is the pixels to torques problem, which identifies key elements of an autonomous agent: autonomous thinking and decision making using sensor measurements only, learning from mistakes, and applying past experiences to novel […]
Jen-Cheng Huang
Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as […]
View View   Download Download (PDF)   
Joao Filipe Ferreira, Pablo Lanillos, Jorge Dias
In this text, we present the principles that allow the tractable implementation of exact inference processes concerning a group of widespread classes of Bayesian generative models, which have until recently been deemed as intractable whenever formulated using high-dimensional joint distributions. We will demonstrate the usefulness of such a principled approach with an example of real-time […]
View View   Download Download (PDF)   
Hasmik Osipyan, Martin Krulis, Stephane Marchand-Maillet
The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction which is defined as a process of mapping data from high-dimensional space into low-dimensional. One of the most popular methods for handling this problem is multidimensional scaling. Due to the technological advances, the dimensionality of the input data as […]
View View   Download Download (PDF)   
Page 1 of 52912345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1577 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

293 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: