## Posts

Nov, 10

### Memory layout in GPU implementation of lattice Boltzmann method for sparse 3D geometries

We describe a high-performance implementation of the lattice Boltzmann method (LBM) for sparse 3D geometries on graphic processors (GPU). The main contribution of this work is a data layout that allows to minimise the number of redundant memory transactions during the propagation step of LBM. We show that by using a uniform mesh of small […]

Nov, 8

### Balancing locality and concurrency: solving sparse triangular systems on GPUs

Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage […]

Nov, 8

### Tamp: A Library for Compact Deep Neural Networks with Structured Matrices

We introduce Tamp, an open source C++ library for reducing the space and time costs of deep neural network models. In particular, Tamp implements several recent works which use structured matrices to replace unstructured matrices which are often bottlenecks in neural networks. Tamp is also designed to serve as a unified development platform with several […]

Nov, 8

### Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos

The subject of this report is the performance portability of the Aeras global atmosphere dynamical core (implemented within the Albany multi-physics code) to new and emerging architecture machines using the Kokkos library and programming model. We describe the process of refactoring the finite element assembly process for the 3D hydrostatic model in Aeras and highlight […]

Nov, 8

### Accelerate Deep Learning Inference with MCTS in the game of Go on the Intel Xeon Phi

The performance of Deep Learning Inference is a serious issue when combining with speed delicate Monte Carlo Tree Search. Traditional hybrid CPU and Graphics processing unit solution is bounded because of frequently heavy data transferring. This paper proposes a method making Deep Convolution Neural Network prediction and MCTS execution simultaneously at Intel Xeon Phi. This […]

Nov, 8

### Vispark: GPU-Accelerated Distributed Visual Computing Using Spark

With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework suffers from inflexibility and inefficiency inherent to its programming model and system architecture. In order to address these problems, […]

Nov, 5

### UNICORN: A Bulk Synchronous Programming Model, Framework and Runtime for Hybrid CPU-GPU Clusters

Rapid evolution of graphics processing units (GPUs) into general purpose computing devices has made them vital to high performance computing clusters. These computing environments consist of multiple nodes connected by a high speed network such as Infiniband, with each node comprising several multi-core processors and several many-core accelerators. The difficulty of programming hybrid CPU-GPU clusters […]

Nov, 5

### HPVM: A Portable Virtual Instruction Set for Heterogeneous Parallel Systems

We describe a programming abstraction for heterogeneous parallel hardware, designed to capture a wide range of popular parallel hardware, including GPUs, vector instruction sets and multicore CPUs. Our abstraction, which we call HPVM, is a hierarchical dataflow graph with shared memory and vector instructions. We use HPVM to define both a virtual instruction set (ISA) […]

Nov, 5

### Molecular Activity Prediction using Deep Learning Software Library

In order to know how work deep learning method in chemoinformatics and bioinformatics problems, we have attempted to predict the molecular activities using the molecular fingerprints (chemical descriptor vectors) provided by the "Merck molecular activity challenge" competition and an open source deep learning library Chainer. Our result has been able to reproduce almost identical increase-decrease […]

Nov, 5

### grim: A Flexible, Conservative Scheme for Relativistic Fluid Theories

Hot, diffuse, relativistic plasmas such as sub-Eddington black hole accretion flows are expected to be collisionless, yet are commonly modeled as a fluid using ideal general relativistic magnetohydrodynamics (GRMHD). Dissipative effects such as heat conduction and viscosity can be important in a collisionless plasma and will potentially alter the dynamics and radiative properties of the […]

Nov, 5

### A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Sorting is at the core of many database operations, such as index creation, sort-merge joins and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. Over the past few years, several improvements have been proposed for sorting on GPUs, leading to the […]

Nov, 3

### Extensions and Limitations of the Neural GPU

The Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. […]