From Parallel Programs to Customized Parallel Processors

hgpu.org » Programming » Algorithms » From Parallel Programs to Customized Parallel Processors

From Parallel Programs to Customized Parallel Processors

Pekka Jaaskelainen

Tampere University of Technology, Tampere

Tampere University of Technology, 2012

@article{jaaskelainen2012parallel,

title={From Parallel Programs to Customized Parallel Processors},

author={J{"a}{"a}skel{"a}inen, P.},

journal={Tampereen teknillinen yliopisto. Julkaisu-Tampere University of Technology. Publication; 1086},

year={2012}

}

Download (PDF)

View

Source

1597

views

The need for fast time to market of new embedded processor-based designs calls for a rapid design methodology of the included processors. The call for such a methodology is even more emphasized in the context of so called soft cores targeted to reconfigurable fabrics where per-design processor customization is commonplace. The C language has been commonly used as an input to hardware/software co-design flows. However, as C is a sequential language, its potential to generate parallel operations to utilize naturally parallel hardware constructs is far from optimal, leading to a customized processor design space with limited parallel resource scalability. In contrast, when utilizing a parallel programming language as an input, a wider processor design space can be explored to produce customized processors with varying degrees of utilized parallelism. This Thesis proposes a novel Multicore Application-Specific Instruction Set Processor (MCASIP) co-design methodology that exploits parallel programming languages as the application input format. In the methodology, the designer can explicitly capture the parallelism of the algorithm and exploit specialized instructions using a parallel programming language in contrast to being on the mercy of the compiler or the hardware to extract the parallelism from a sequential input. The Thesis proposes a multicore processor template based on the Transport Triggered Architecture, compiler techniques involved in static parallelization of computation kernels with barriers and a datapath integrated hardware accelerator for low overhead software synchronization implementation. These contributions enable scaling the customized processors both at the instruction and task levels to efficiently exploit the parallelism in the input program up to the implementation constraints such as the memory bandwidth or the chip area. The different contributions are validated with case studies, comparisons and design examples.

Tags: Algorithms, Compilers, Computer science, FPGA, High-level Languages, nVidia, nVidia GeForce 9400, OpenCL, Thesis

November 27, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org