high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Code Optimization on Kepler GPUs and Xeon Phi

Code Optimization on Kepler GPUs and Xeon Phi

Yong-Chull Jang, Hwancheol Jeong, Jangho Kim, Weonjong Lee, Jeonghwan Pak, Yuree Chung (SWME Collaboration)

Lattice Gauge Theory Research Center, CTP, and FPRD,Department of Physics and Astronomy, Seoul National University, Seoul, 151-747, South Korea

arXiv:1411.2223 [hep-lat], (9 Nov 2014)

BibTeX

Download (PDF)

View

Source

2417

views

Kepler GTX Titan Black and Kepler Tesla K40 are still the best GPUs for high performance computing, although Maxwell GPUs such as GTX 980 are available in the market. Hence, we measure the performance of our lattice QCD codes using the Kepler GPUs. We also upgrade our code to use the latest CPS (Columbia Physics System) library along with the most recent QUDA (QCD CUDA) library for lattice QCD. These new libraries improve the performance of our conjugate gradient (CG) inverter so that it runs twice faster than before. We also investigate the performance of Xeon Phi 7120P coprocessor. It has similar computing power with the Kepler GPUs in principle. However, its performance for our CG code is significantly inferior to that of the GTX Titan Black GPUs at present.

Tags: CUDA, High Energy Physics – Lattice, Intel Xeon Phi, Maxwell’s equations, MPI, nVidia, nVidia GeForce GTX 980, nVidia GeForce GTX Titan, OpenMP, Performance, Physics, QCD, Tesla K40

November 12, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Code Optimization on Kepler GPUs and Xeon Phi

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Code Optimization on Kepler GPUs and Xeon Phi

Share this:

Recent source codes

Most viewed papers (last 30 days)