high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Compiler for Throughput Optimization of Graph Algorithms on GPUs

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

Sreepathi Pai, Keshav Pingali

The University of Texas at Austin, USA

OOPSLA ’16, 2016

BibTeX

Download (PDF)

View

Source

2539

views

Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand. To address this problem, we have implemented these optimizations in a compiler that produces CUDA code from an intermediate-level program representation called IrGL. Compared to state-of-the-art handwritten CUDA implementations of eight graph applications, code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never more than 30% slower for the others. Throughput optimizations contribute an improvement up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.

Tags: Compilers, Computer science, CUDA, Graph theory, nVidia, Tesla K40

September 20, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)