Extended-precision floating-point numbers for GPU computation

hgpu.org » Programming » Algorithms » Extended-precision floating-point numbers for GPU computation

Extended-precision floating-point numbers for GPU computation

Andrew Thall

Alma College

In SIGGRAPH ’06: ACM SIGGRAPH 2006 Research posters (2006)

DOI:10.1145/1179622.1179682

BibTeX

Download (PDF)

View

Source

2094

views

Double-ﬂoat (df64) and quad-ﬂoat (qf128) numeric types can be implemented on current GPU hardware and used efﬁciently and effectively for extended-precision computational arithmetic. Using unevaluated sums of paired or quadrupled f32 single-precision values, these numeric types provide approximately 48 and 96 bits of mantissa respectively at single-precision exponent ranges for computer graphics, numerical, and general-purpose GPU programming. This paper surveys current art, presents algorithms and Cg implementation for arithmetic, exponential and trigonometric functions, and presents data on numerical accuracy on several different GPUs. It concludes with an in-depth discussion of the application of extended precision primitives to performing fast Fourier transforms on the GPU for real and complex data. [Addendum (July 2009): the presence of IEEE compliant double-precision hardware in modern GPUs from NVidia and other manufacturers has reduced the need for these techniques. The double-precision capabilities can be accessed using CUDA or other GPGPU software, but are not (as of this writing) exposed in the graphics pipeline for use in Cg-based shader code. Shader writers or those still using a graphics API for their numerical computing may still ﬁnd the methods described herein to be of interest.]

Tags: Algorithms, Cg, Computer science, Extended precision, FFT, nVidia, nVidia GeForce 7900 GT, nVidia GeForce 8800 GT

December 3, 2010 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Extended-precision floating-point numbers for GPU computation

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Extended-precision floating-point numbers for GPU computation

Share this:

Recent source codes

Most viewed papers (last 30 days)