8405

Floating-Point Arithmetic in Transport Triggered Architectures

Timo Viitanen
Tampere University of Technology
Tampere University of Technology, 2012

@article{jaaskelainen2012timo,

   title={TIMO VIITANEN FLOATING-POINT ARITHMETIC IN TRANSPORT TRIGGERED ARCHITECTURES},

   author={J{"a}{"a}skel{"a}inen, M.P.},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

1910

views

Many computational applications have high performance and energy-efficiency requirements which "off-the-shelf" general-purpose processors cannot meet. On the other hand, designing special-purpose hardware accelerators can be prohibitively expensive in terms of development time. One approach to the problem is to design an Application-Specific Instruction set Processor (ASIP), which is programmable, but tailor-made for the task at hand. The process of customizing an ASIP requires heavy automation to be cost-effective. The TTA-based Codesign Environment (TCE) is an ASIP design toolset based on the highly flexible Transport Triggered Architecture (TTA) processor model, which scales from simple low-power cores up to high performance VLIW processors. Hardware accelerated support for floating-point arithmetic is necessary for many applications in the fields of scientific computation and digital signal processing, which would especially benefit from the scalability and instruction-level parallelism of TTA. This thesis introduces a comprehensive suite of RTL implementations of floating-point units designed and implemented for the TCE project. The main design requirements were portability and performance on FPGA platforms even at the cost of reduced standards compliance. The suite includes an option for half-precision arithmetic. In addition, this thesis proposes fast software floating-point division and square root algorithms based on special instructions. The implemented units were verified on the register transfer level using an automated test bench. When benchmarked on an Altera Stratix-II FPGA, the units exhibited performance close to the highly optimized units supplied by Altera, while retaining platform independence. On more recent FPGAs such as the Xilinx Virtex-6, finer-grained pipelining is required for maximum performance.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: