Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

hgpu.org » Applications » Computer science » Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Dongyou Seo, Shin-gyu Kim, Hyeonsang Eom, Heon Y. Yeom

School of Computer Science and Engineering, Seoul National University, Seoul, Korea

International Journal on Computer Science and Engineering (IJCSE), Vol. 5 No. 05, 2013

BibTeX

Download (PDF)

View

Source

1783

views

Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating point performance [1]. The performance of K20X is 10 times higher than Intel’s high-end CPUs. Due to the high performance computability of GPU, K20X was adapted to Titan, the first super computer in the world [2][3]. However, additional steps are needed in GPU computing process, which aren’t needed in the computation using only CPU. The data required to execute on GPU has to move from main memory to global memory of GPU before GPU computation. The results created on GPU also have to write back to main memory. The data movement is called as CPU-GPU communication. The communication between CPU and GPU is a big part of the computation using GPU. So, many studies tried to optimize CPU-GPU communication [4][5]. In this paper, we evaluated the performance of CPU-GPU communication depending on co-located workloads and presented which workload severely degraded the performance of CPU-GPU communication.

Tags: CUDA, nVidia, nVidia GeForce GTX 580, Performance

May 17, 2013 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Share this:

Recent source codes

Most viewed papers (last 30 days)