https://hgpu.org/?p=2895
Accelerating Double Precision Floating-point Hessenberg Reduction on FPGA and Multicore Architectures