high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Swizzle Inventor: Data Movement Synthesis for GPU Kernels

Swizzle Inventor: Data Movement Synthesis for GPU Kernels

Phitchaya Mangpo Phothilimthana, Archibald Samuel Elliott, An Wang, Abhinav Jangda, Bastian Hagedorn, Henrik Barthels, Samuel J. Kaufman, Vinod Grover, Emina Torlak, Rastislav Bodik

University of California, Berkeley

Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019

BibTeX

Download (PDF)

View

Source

2340

views

Utilizing memory and register bandwidth in modern architectures may require irregular data placement and movement, such as shuffles and broadcasts. We develop Swizzle Inventor to help programmers implement swizzle algorithms, by writing programs that omit swizzles and delegating the creation of those swizzles to an automatic synthesizer. Our synthesis algorithm scales to real-world programs, allowing us to invent new GPU kernels for stencil computations, matrix transposition, and a finite field multiplication algorithm (used in cryptographic applications). The synthesized 2D convolution and finite-field multiplication kernels are on average 1.5-3.2x and 1.1-1.7x faster, respectively, than expert-optimized CUDA kernels.

Tags: Code generation, Computer science, CUDA, nVidia, nVidia GeForce GTX Titan X, nVidia Quadro GV100

February 3, 2019 by hgpu

Rating: 1.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Swizzle Inventor: Data Movement Synthesis for GPU Kernels

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Swizzle Inventor: Data Movement Synthesis for GPU Kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)