high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » MILC Code Performance on High End CPU and GPU Supercomputer Clusters

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Ruizi Li, Carleton DeTar, Steven Gottlieb, Doug Toussaint

Department of Physics & Astronomy, University of Utah, Salt Lake City, UT 84112, U.S.A.

arXiv:1712.00143 [hep-lat], (1 Dec 2017)

BibTeX

Download (PDF)

View

Source

2789

views

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Tags: Computational Physics, CUDA, High Energy Physics – Lattice, Intel Xeon Phi, nVidia, Performance, Physics, QCD, Tesla K20, Tesla P100

December 7, 2017 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)