## Posts

Sep, 20

### Parallel Computational Fluid Dynamics With the Intel Xeon Phi Coprocessor

The Intel Xeon Phi coprocessor is a PCI Express form factor card designed to work in tangent with Intel Xeon processors in order to allow faster execution of highly parallelizable code. Efficient execution of highly parallel applications is achieved through the use of many smaller, lower clock speed cores; allowing for many more simultaneous execution […]

Sep, 20

### A Compiler for Throughput Optimization of Graph Algorithms on GPUs

Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand. To address this problem, we have implemented these optimizations […]

Sep, 20

### Feynman Machine: The Universal Dynamical Systems Computer

Efforts at understanding the computational processes in the brain have met with limited success, despite their importance and potential uses in building intelligent machines. We propose a simple new model which draws on recent findings in Neuroscience and the Applied Mathematics of interacting Dynamical Systems. The Feynman Machine is a Universal Computer for Dynamical Systems, […]

Sep, 20

### Runtime Support for Adaptive Power Capping on Heterogeneous SoCs

Power capping is a fundamental method for reducing the energy consumption of a wide range of modern computing environments, ranging from mobile embedded systems to datacentres. Unfortunately, maximising performance and system efficiency under static power caps remains challenging, while maximising performance under dynamic power caps has been largely unexplored. We present an adaptive power capping […]

Sep, 17

### Devito: automated fast finite difference computation

Domain specific languages have successfully been used in a variety of fields to cleanly express scientific problems as well as to simplify implementation and performance optimization on different computer architectures. Although a large number of stencil languages are available, finite difference domain specific languages have proved challenging to design because most practical use cases require […]

Sep, 17

### The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

Accuracy, descriptor size, and the time required for extraction and matching are all important factors when selecting local image descriptors. To optimize over all these requirements, this paper presents a CUDA port for the recent Learned Arrangement of Three Patches (LATCH) binary descriptors to the GPU platform. The design of LATCH makes it well suited […]

Sep, 17

### Parallel Dynamics Computation using Prefix Sum Operations

We propose a new parallel framework for fast computation of inverse and forward dynamics of articulated robots based on prefix sums (scans). We re-investigate the well-known recursive Newton-Euler formulation of robot dynamics and show that the forward-backward propagation process for robot inverse dynamics is equivalent to two scan operations on certain semigroups. We show that […]

Sep, 17

### A parallel pattern for iterative stencil + reduce

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy […]

Sep, 17

### Finding faint HI structure in and around galaxies: scraping the barrel

Soon to be operational HI survey instruments such as APERTIF and ASKAP will produce large datasets. These surveys will provide information about the HI in and around hundreds of galaxies with a typical signal-to-noise ratio of ~10 in the inner regions and ~1 in the outer regions. In addition, such surveys will make it possible […]

Sep, 17

### Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. Our approach further reduces the computational cost by […]

Sep, 16

### Agent-Based Modeling on High Performance Computing Architectures

In spatial agent-based models (SABMs) each entity of the system being modeled is uniquely represented as an independent agent. Large scale emergent behavior in SABMs is population sensitive. Thus, the number of agents should reflect the system being modeled, which can be in the order of billions. Models can be decomposed such that each component […]

Sep, 13

### Data Analysis of Minimally-Structured Heterogeneous Logs: An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes

Nowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with great significance so that the delivered software is meeting all the […]