high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » C-for-Metal: High Performance SIMD Programming on Intel GPUs

C-for-Metal: High Performance SIMD Programming on Intel GPUs

Guei-Yuan Lueh, Kaiyu Chen, Gang Chen, Joel Fuentes, Wei-Yu Chen, Fangwen Fu, Hong Jiang, Hongzheng Li, Daniel Rhee

Intel Corporation

arXiv:2101.11049 [cs.DC], (26 Jan 2021)

@misc{lueh2021cformetal,

title={C-for-Metal: High Performance SIMD Programming on Intel GPUs},

author={Guei-Yuan Lueh and Kaiyu Chen and Gang Chen and Joel Fuentes and Wei-Yu Chen and Fangwen Fu and Hong Jiang and Hongzheng Li and Daniel Rhee},

year={2021},

eprint={2101.11049},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

C-for-Metal: High Performance SIMD Programming on Intel GPUs

2946

views

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.

Tags: Computer science, CUDA, nVidia, OpenCL, Package, Programming Languages

January 31, 2021 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

C-for-Metal: High Performance SIMD Programming on Intel GPUs

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

C-for-Metal: High Performance SIMD Programming on Intel GPUs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)