This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela […]

March 22, 2015 by hgpu

We discuss several strategies to implement Dykstra’s projection algorithm on NVIDIA’s compute unified device architecture (CUDA). Dykstra’s algorithm is the central step in and the computationally most expensive part of statistical multi-resolution methods. It projects a given vector onto the intersection of convex sets. Compared with a CPU implementation our CUDA implementation is one order […]

March 14, 2015 by hgpu

This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]

March 3, 2015 by hgpu

We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number […]

February 22, 2015 by hgpu

Obtaining a thermodynamically accurate phase diagram through numerical calculations is a computationally expensive problem that is crucially important to understanding the complex phenomena of solid state physics, such as superconductivity. In this work we show how this type of analysis can be significantly accelerated through the use of modern GPUs. We illustrate this with a […]

January 2, 2015 by hgpu

This paper describes some applications of GPU acceleration in ab initio nuclear structure calculations. Specifically, we discuss GPU acceleration of the software package MFDn, a parallel nuclear structure eigensolver. We modify the matrix construction stage to run partly on the GPU. On the Titan supercomputer at the Oak Ridge Leadership Computing Facility, this produces a […]

December 22, 2014 by hgpu

We adopt CUDA-capable Graphic Processing Units (GPUs) for Landau, Coulomb and maximally Abelian gauge fixing in 3+1 dimensional SU(3) and SU(2) lattice gauge field theories. A combination of simulated annealing and overrelaxation is used to aim for the global maximum of the gauge functional. We use a fine grained degree of parallelism to achieve the […]

December 12, 2014 by hgpu

We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson […]

December 12, 2014 by hgpu

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such […]

December 9, 2014 by hgpu

In this work, we present the GPU implementation of the overrelaxation and steepest descent method with Fourier acceleration methods for Laudau and Coulomb gauge fixing using CUDA for SU(N) with N>2. A multi-GPU implementation of the overrelaxation method is also presented using MPI and CUDA. The GPU performance was measured on BlueWaters and compared against […]

December 5, 2014 by hgpu

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. […]

December 5, 2014 by hgpu

As a generic example for crystals where the crystal-fluid interface tension depends on the orientation of the interface relative to the crystal lattice axes, the nearest neighbor Ising model on the simple cubic lattice is studied over a wide temperature range, both above and below the roughening transition temperature. Using a thin film geometry $L_x […]

November 25, 2014 by hgpu