The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely-available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on […]

October 3, 2015 by hgpu

We derive a new class of fast algorithms for convolutional neural networks using Winograd’s minimal filtering algorithms. Specifically we derive algorithms for network layers with 3×3 kernels, which are the preferred kernel size for image recognition tasks. The best of our algorithms reduces arithmetic complexity up to 4X compared with direct convolution, while using small […]

October 3, 2015 by hgpu

We present a brute-force approach for finding k-nearest neighbors on the GPU for many queries in parallel. Our program takes advantage of recent advances in fundamental GPU computing primitives. We modify a matrix multiplication subroutine in MAGMA library [6] to calculate the squared Euclidean distances between queries and references. The nearest neighbors selection is accomplished […]

October 3, 2015 by hgpu

We discuss an approach for solving sparse or dense banded linear systems ${bf A} {bf x} = {bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${bf A} in {mathbb{R}}^{N times N}$ is possibly nonsymmetric and moderately large; i.e., $10000 leq N leq 500000$. The ${it split and parallelize}$ (${tt SaP}$) approach seeks […]

September 30, 2015 by hgpu

Insider threat is one of the risks both government and private organizations have to deal with in protecting their important information. Data exfiltration and data leakage resulting from insiders activities can be very difficult to identify and quantify. Unfortunately, existing solutions that efficiently check whether data moving across a network is known to be sensitive […]

September 29, 2015 by hgpu

Dataflow models are valuable tools for representing, analyzing, and synthesizing embedded systems. Heterogeneous computing platforms with multi-core CPU and Graphics Processing Units (GPUs) provide a low cost platform for high performance computations. In this report, we present a dataflow based automated design framework that incorporates analysis, optimization and synthesis tools for embedded systems. Our framework […]

September 29, 2015 by hgpu

The intractability of the ECDLP is part of what makes many cryptographic application work. As such, viewing this problem from as many angles as possible is worthwhile. In this thesis, we explore the angle of creating a GPU ECDLP solver using OpenCL. In the process, we discuss the many issues, limitations and solutions we encounter. […]

September 29, 2015 by hgpu

A growing number of commercial and enterprise systems increasingly rely on compute-intensive machine learning algorithms. While the demand for these compute-intensive applications is growing, the performance benefits from general-purpose platforms are diminishing. To accommodate the needs of machine learning algorithms, Field Programmable Gate Arrays (FPGAs) provide a promising path forward and represent an intermediate point […]

September 29, 2015 by hgpu

Data-efficient learning in continuous state-action spaces using high-dimensional observations remains an elusive challenge in developing fully autonomous systems. An instance of this challenge is the pixels to torques problem, which identifies key elements of an autonomous agent: autonomous thinking and decision making using sensor measurements only, learning from mistakes, and applying past experiences to novel […]

September 26, 2015 by hgpu

Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as […]

September 26, 2015 by hgpu

In this text, we present the principles that allow the tractable implementation of exact inference processes concerning a group of widespread classes of Bayesian generative models, which have until recently been deemed as intractable whenever formulated using high-dimensional joint distributions. We will demonstrate the usefulness of such a principled approach with an example of real-time […]

September 26, 2015 by hgpu

The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction which is defined as a process of mapping data from high-dimensional space into low-dimensional. One of the most popular methods for handling this problem is multidimensional scaling. Due to the technological advances, the dimensionality of the input data as […]

September 26, 2015 by hgpu