high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Simon Heybrock, Balint Joo, Dhiraj D. Kalamkar, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Tilo Wettig, Pradeep Dubey

Institute for Theoretical Physics, University of Regensburg, Germany

arXiv:1412.2629 [hep-lat], (8 Dec 2014)

DOI:10.1109/SC.2014.11

@{,

}

Download (PDF)

View

Source

2151

views

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.

Tags: Algorithms, Computational Physics, High Energy Physics – Lattice, Intel Xeon Phi, Performance, Physics, QCD

December 9, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)