Posts
May, 11
Theano: A Python framework for fast computation of mathematical expressions
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers – especially in the machine learning community – and has shown steady performance improvements. Theano is being actively and continuously developed […]
May, 11
The GPU-based Parallel Ant Colony System
The Ant Colony System (ACS) is, next to Ant Colony Optimization (ACO) and the MAX-MIN Ant System (MMAS), one of the most efficient metaheuristic algorithms inspired by the behavior of ants. In this article we present three novel parallel versions of the ACS for the graphics processing units (GPUs). To the best of our knowledge, […]
May, 9
Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model
In this paper we present microbenchmarks in OpenCL to measure the most important performance characteristics of GPUs. Microbenchmarks try to measure individual characteristics that influence the performance. First, performance, in operations or bytes per second, is measured with respect to the occupancy and as such provides an occupancy roofline curve. The curve shows at which […]
May, 9
SecureMed: Secure Medical Computation using GPU-Accelerated Homomorphic Encryption Scheme
Sharing the medical records of individuals among healthcare providers and researchers around the world can accelerate advances in medical research. While the idea seems increasingly practical due to cloud data services, maintaining patient privacy is of paramount importance. Standard encryption algorithms help protect sensitive data from outside attackers but they cannot be used to compute […]
May, 9
A Graph-based Model for GPU Caching Problems
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling among different threads. Traditionally, in the field of parallel computing, graph partition models are used to model data communication and […]
May, 9
Training Neural Networks Without Gradients: A Scalable ADMM Approach
With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks. This is largely because conventional optimization algorithms rely on stochastic gradient methods that don’t scale well to large numbers of cores in a cluster setting. Furthermore, the convergence of all gradient methods, including batch […]
May, 9
Parallelizing Word2Vec in Shared and Distributed Memory
Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its exceptional performance in many NLP applications such as named entity recognition, sentiment analysis, machine translation and question answering. State-of-the-art algorithms including those by Mikolov […]
May, 7
Parallel Wavelet Schemes for Images
In this paper, we introduce several new schemes for calculation of discrete wavelet transforms of images. These schemes reduce the number of steps and, as a consequence, allow to reduce the number of synchronizations on parallel architectures. As an additional useful property, the proposed schemes can reduce also the number of arithmetic operations. The schemes […]
May, 7
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods are computational-intensive and resource-consuming, and thus are hard to be integrated into embedded systems such as smart phones, smart […]
May, 7
JIT-Compilation for Interactive Scientific Visualization
Due to the proliferation of mobile devices and cloud computing, remote simulation and visualization have become increasingly important. In order to reduce bandwidth and (de)serialization costs, and to improve mobile battery life, we examine the performance and bandwidth benefits of using an optimizing query compiler for remote postprocessing of interactive and in-situ simulations. We conduct […]
May, 7
TheanoLM – An Extensible Toolkit for Neural Network Language Modeling
We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text. The tool has been written using Python library Theano, which allows researcher to easily extend it and tune any aspect of the training process. Regardless of the flexibility, Theano is able to generate extremely fast native code that […]
May, 7
Parallel Pairwise Correlation Computation On Intel Xeon Phi Clusters
Co-expression network is a critical technique for the identification of inter-gene interactions, which usually relies on all-pairs correlation (or similar measure) computation between gene expression profiles across multiple samples. Pearson’s correlation coefficient (PCC) is one widely used technique for gene co-expression network construction. However, all-pairs PCC computation is computationally demanding for large numbers of gene […]