Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

hgpu.org » Applications » Computer science » Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Niklas Ulvinge

Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden

Chalmers University of Technology, 2014

@article{ulvinge2013increasing,

title={Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis},

author={Ulvinge, Niklas},

year={2013}

}

Download (PDF)

View

Source

1435

views

GPGPU (general purpose computing on graphics processing units) programming is one interesting way to increase performance; unfortunately it is not easily done, because extensive knowledge of the GPU’s architecture is required to write programs that are faster than CPU programs. Obsidian is an embedded domain specific language for writing GPGPU kernels, which tries to make GPUs more programmable, but it still requires extensive knowledge of the GPU’s architecture to write fast kernels. This thesis demonstrates extensions to Obsidian, which increase the programmability of graphics processors. The methods described in this thesis increase the programmability by providing the programmer with feedback about their code through static analysis regarding possible performance bottlenecks, and common programming mistakes. This thesis also demonstrates how many of the decisions of optimizing kernels can be automated through different code transformations. The resulting domain specific language improves upon Obsidian by requiring less knowledge of GPU programming, making it easier to write correct programs, while still providing programs that are as fast and as expressive. The different kinds of feedback provided to the programmer using static analysis are many. Out of bounds checking and race condition detection are useful for determining correctness of code. Memory access patterns analysis for determining coalescing and bank conflict issues, divergent branch detection, unnecessary synchronization detection, and a cost model are useful for finding bottlenecks. The code transformations used are scalar depromotion, unnecessary synchronization removal, and some traditional loop transformations that enable an arbitrarily structured program to be transformed into a kernel efficiently runnable on a GPU.

Tags: Code generation, Computer science, CUDA, nVidia, nVidia GeForce GTX 480, Thesis

March 4, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org