high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Physics » Efficient implementation of the overlap operator on multi-GPUs

Efficient implementation of the overlap operator on multi-GPUs

Andrei Alexandru, Michael Lujan, Craig Pelissier, Ben Gamari, Frank X. Lee

Department of Physics, The George Washington University, 725 21st St. NW, Washington, DC 20052

arXiv:1106.4964v1 [hep-lat] (24 Jun 2011)

BibTeX

Download (PDF)

View

Source

1949

views

Lattice QCD calculations were one of the first applications to show the potential of GPUs in the area of high performance computing. Our interest is to find ways to effectively use GPUs for lattice calculations using the overlap operator. The large memory footprint of these codes requires the use of multiple GPUs in parallel. In this paper we show the methods we used to implement this operator efficiently. We run our codes both on a GPU cluster and a CPU cluster with similar interconnects. We find that to match performance the CPU cluster requires 20-30 times more CPU cores than GPUs.

Tags: GPU cluster, High Energy Physics – Lattice, MPI, nVidia, Performance, Physics, Tesla M2070

June 27, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient implementation of the overlap operator on multi-GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Efficient implementation of the overlap operator on multi-GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)