high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Implementing the Himeno benchmark with CUDA on GPU clusters

Implementing the Himeno benchmark with CUDA on GPU clusters

Everett H. Phillips, Massimiliano Fatica

NVIDIA, US

2010 IEEE International Symposium on Parallel Distributed Processing IPDPS (2010) Publisher: IEEE, Pages: 1-10

DOI:10.1109/IPDPS.2010.5470394

BibTeX

Source

2253

views

This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers allows linear scaling and performs at over 800 GFlops on a cluster with 16 GPUs. The paper presents the optimizations required to achieve this level of performance.

Tags: Benchmarking, Computer science, CUDA, GPU cluster, nVidia, Performance, Tesla C1060

March 16, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Implementing the Himeno benchmark with CUDA on GPU clusters

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Implementing the Himeno benchmark with CUDA on GPU clusters

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)