16697

Posts

Nov, 8

Performance Portability of the Aeras Atmosphere Model to Next Generation Architectures using Kokkos

The subject of this report is the performance portability of the Aeras global atmosphere dynamical core (implemented within the Albany multi-physics code) to new and emerging architecture machines using the Kokkos library and programming model. We describe the process of refactoring the finite element assembly process for the 3D hydrostatic model in Aeras and highlight […]
Nov, 8

Accelerate Deep Learning Inference with MCTS in the game of Go on the Intel Xeon Phi

The performance of Deep Learning Inference is a serious issue when combining with speed delicate Monte Carlo Tree Search. Traditional hybrid CPU and Graphics processing unit solution is bounded because of frequently heavy data transferring. This paper proposes a method making Deep Convolution Neural Network prediction and MCTS execution simultaneously at Intel Xeon Phi. This […]
Nov, 8

Vispark: GPU-Accelerated Distributed Visual Computing Using Spark

With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework suffers from inflexibility and inefficiency inherent to its programming model and system architecture. In order to address these problems, […]
Nov, 5

UNICORN: A Bulk Synchronous Programming Model, Framework and Runtime for Hybrid CPU-GPU Clusters

Rapid evolution of graphics processing units (GPUs) into general purpose computing devices has made them vital to high performance computing clusters. These computing environments consist of multiple nodes connected by a high speed network such as Infiniband, with each node comprising several multi-core processors and several many-core accelerators. The difficulty of programming hybrid CPU-GPU clusters […]
Nov, 5

HPVM: A Portable Virtual Instruction Set for Heterogeneous Parallel Systems

We describe a programming abstraction for heterogeneous parallel hardware, designed to capture a wide range of popular parallel hardware, including GPUs, vector instruction sets and multicore CPUs. Our abstraction, which we call HPVM, is a hierarchical dataflow graph with shared memory and vector instructions. We use HPVM to define both a virtual instruction set (ISA) […]
Nov, 5

Molecular Activity Prediction using Deep Learning Software Library

In order to know how work deep learning method in chemoinformatics and bioinformatics problems, we have attempted to predict the molecular activities using the molecular fingerprints (chemical descriptor vectors) provided by the "Merck molecular activity challenge" competition and an open source deep learning library Chainer. Our result has been able to reproduce almost identical increase-decrease […]
Nov, 5

grim: A Flexible, Conservative Scheme for Relativistic Fluid Theories

Hot, diffuse, relativistic plasmas such as sub-Eddington black hole accretion flows are expected to be collisionless, yet are commonly modeled as a fluid using ideal general relativistic magnetohydrodynamics (GRMHD). Dissipative effects such as heat conduction and viscosity can be important in a collisionless plasma and will potentially alter the dynamics and radiative properties of the […]
Nov, 5

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Sorting is at the core of many database operations, such as index creation, sort-merge joins and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. Over the past few years, several improvements have been proposed for sorting on GPUs, leading to the […]
Nov, 3

Extensions and Limitations of the Neural GPU

The Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. […]
Nov, 3

Diplomat: Mapping of multi-kernel applications using a static dataflow abstraction

In this paper we propose a novel approach to heterogeneous embedded systems programmability using a taskgraph based framework called Diplomat. Diplomat is a taskgraph framework that exploits the potential of static dataflow modeling and analysis to deliver performance estimation and CPU/GPU mapping. An application has to be specified once, and then the framework can automatically […]
Nov, 3

MILC staggered conjugate gradient performance on Intel KNL

We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second generation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing […]
Nov, 3

A hybrid algorithm for parallel molecular dynamics simulations

This article describes an algorithm for hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-ranged forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org