## Posts

Jan, 23

### GPGPU Performance Estimation with Core and Memory Frequency Scaling

Graphics Processing Units (GPUs) support dynamic voltage and frequency scaling (DVFS) in order to balance computational performance and energy consumption. However, there still lacks simple and accurate performance estimation of a given GPU kernel under different frequency settings on real hardware, which is important to decide best frequency configuration for energy saving. This paper reveals […]

Jan, 23

### A task-driven implementation of a simple numerical solver for hyperbolic conservation laws

This article describes the implementation of an all-in-one numerical procedure within the runtime StarPU. In order to limit the complexity of the method, for the sake of clarity of the presentation of the non-classical task-driven programming environnement, we have limited the numerics to first order in space and time. Results show that the task distribution […]

Jan, 23

### Multi-core parallelism in a column-store

The research reported in this thesis addresses several challenges of improving the efficiency and effectiveness of parallel processing of analytical database queries on modern multi- and many-core systems, using an open-source column-oriented analytical database management system, MonetDB, for validation. In contrast to the existing work we also broaden the research from focusing on individual operators […]

Jan, 19

### Deep Learning for Computational Chemistry

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen […]

Jan, 19

### OpenNMT: Open-Source Toolkit for Neural Machine Translation

We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about […]

Jan, 19

### Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems

As Moore s law continues, processors keep getting more cores packed together on the chip. This thesis is an empirical study of the rather newly introduced Intel Many Integrated Core (IMIC) architecture found in the Intel Xeon Phi. With roughly 60 cores connected by a high performance on-die interconnect, the Intel Xeon Phi makes an […]

Jan, 19

### Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit […]

Jan, 19

### Light Loss-Less Data Compression, with GPU Implementation

There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call Light Loss-Less (LLL) compression. It […]

Jan, 16

### An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have […]

Jan, 16

### Using efficient parallelization in Graphic Processing Units to parameterize stochastic fire propagation models

Fire propagation is a major concern in the world in general and in Argentinian northwestern Patagonia in particular where every year hundreds of hectares are affected by both natural and anthropogenic forest fires. We developed an efficient cellular automata model in Graphic Processing Units (GPUs) to simulate fire propagation. The graphical advantages of GPUs were […]

Jan, 16

### Application of GPU Computing to Some Urban Traffic Problems

The present work studies and proposes GPU-based parallel algorithms and implementations for the problem of macroscopic assignment of urban traffic on large-scale networks, promoting an in-depth investigation on each sub-problem that must be efficiently solved during the traffic assignment process. Among the main contributions of this work, there are: 1) the first GPU-based algorithm for […]

Jan, 16

### An N log N Parallel Fast Direct Solver for Kernel Matrices

Kernel matrices appear in machine learning and non-parametric statistics. Given N points in d dimensions and a kernel function that requires $mathcal{O}(d)$ work to evaluate, we present an $mathcal{O}(dNlog N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, […]