Floating-Point Arithmetic in Transport Triggered Architectures

hgpu.org » Applications » Computer science » Floating-Point Arithmetic in Transport Triggered Architectures

Floating-Point Arithmetic in Transport Triggered Architectures

Timo Viitanen

Tampere University of Technology

Tampere University of Technology, 2012

@article{jaaskelainen2012timo,

title={TIMO VIITANEN FLOATING-POINT ARITHMETIC IN TRANSPORT TRIGGERED ARCHITECTURES},

author={J{"a}{"a}skel{"a}inen, M.P.},

year={2012}

}

Download (PDF)

View

Source

2378

views

Many computational applications have high performance and energy-efficiency requirements which "off-the-shelf" general-purpose processors cannot meet. On the other hand, designing special-purpose hardware accelerators can be prohibitively expensive in terms of development time. One approach to the problem is to design an Application-Specific Instruction set Processor (ASIP), which is programmable, but tailor-made for the task at hand. The process of customizing an ASIP requires heavy automation to be cost-effective. The TTA-based Codesign Environment (TCE) is an ASIP design toolset based on the highly flexible Transport Triggered Architecture (TTA) processor model, which scales from simple low-power cores up to high performance VLIW processors. Hardware accelerated support for floating-point arithmetic is necessary for many applications in the fields of scientific computation and digital signal processing, which would especially benefit from the scalability and instruction-level parallelism of TTA. This thesis introduces a comprehensive suite of RTL implementations of floating-point units designed and implemented for the TCE project. The main design requirements were portability and performance on FPGA platforms even at the cost of reduced standards compliance. The suite includes an option for half-precision arithmetic. In addition, this thesis proposes fast software floating-point division and square root algorithms based on special instructions. The implemented units were verified on the register transfer level using an automated test bench. When benchmarked on an Altera Stratix-II FPGA, the units exhibited performance close to the highly optimized units supplied by Altera, while retaining platform independence. On more recent FPGAs such as the Xilinx Virtex-6, finer-grained pipelining is required for maximum performance.

Tags: Computer science, FPGA, OpenCL, Thesis

October 24, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org