high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Towards Efficient GPU Sharing on Multicore Processors

Towards Efficient GPU Sharing on Multicore Processors

Lingyuan Wang, Miaoqing Huang, Tarek El-Ghazawi

ECE Department, George Washington University

The 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS11), 2011

BibTeX

Download (PDF)

View

Source

1807

views

Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing (HPC). The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a GPU under hybrid programming. Using a set of microbenchmarks and applications on a GPU cluster, we show that thread- and process-based context hosting have different tradeoffs. Experimental results on application benchmarks suggest that both thread-based context funneling and process-based context switching natively perform similarly on the latest Fermi GPU, while manually guided context funneling is currently the best way to achieve optimal performance.

Tags: Computer science, CUDA, GPU cluster, nVidia, Performance, Programming techniques, Tesla C1060, Tesla C2070

November 19, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Towards Efficient GPU Sharing on Multicore Processors

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Towards Efficient GPU Sharing on Multicore Processors

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)