An Accelerator based on the rho-VEX Processor: an Exploration using OpenCL

Hugo van der Wijst
Department of Electrical Engineering, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology
Delft University of Technology, 2015


   title={An Accelerator based on the $rho$-VEX Processor: an Exploration using OpenCL},

   author={Van der Wijst, Hugo},


   school={TU Delft, Delft University of Technology}


Download Download (PDF)   View View   Source Source   



In recent years the use of co-processors to accelerate specific tasks is becoming more common. To simplify the use of these accelerators in software, the OpenCL framework has been developed. This framework provides programs a cross-platform interface for using accelerators. The rho-VEX processor is a run-time reconfigurable VLIW processor. It allows run-time switching of configurations, executing a large amount of contexts with low issue-width or a low amount of contexts with high issue-width. This thesis investigates if the rho-VEX processor can be competitively used as an accelerator using OpenCL. To answer this question, a design and implementation is made of such an accelerator. By measuring the speed of various components of this implementation, a model is created for the run-time of a kernel. Using this model, a projection is made of the execution time on an accelerator produced as an ASIC. For the implementation of the accelerator, the rho-VEX processor is instantiated on an FPGA and connected to the host using the PCI Express bus. A Linux kernel driver has been developed to provide interfaces for user space applications to communicate with the accelerator. These interfaces are used to implement a new device-layer for the pocl OpenCL framework. By modeling the execution time, three major contributing factors to the execution time were found: the data transfer throughput, the kernel compile time, and the kernel execution time. It is projected that an accelerator based on the rho-VEX processor, using similar production technologies and without architectural changes, can achieve 1.2 to 0.11 times the performance of a modern GPU.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: