high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Three Dimensional Fast Fourier Transform CUDA Implementation

Three Dimensional Fast Fourier Transform CUDA Implementation

Kumar Aatish, Boyan Zhang

Department of Computer Science and Engineering, University of California, San Diego

University of California, CSE260 Project Report, 2012

@article{aatish2012three,

title={Three Dimensional Fast Fourier Transform CUDA Implementation},

author={Aatish, Kumar and Zhang, Boyan},

year={2012}

}

Download (PDF)

View

Source

2301

views

A 3 dimensional DFT can be expressed as 3 DFTs on a 3 dimensional data along each dimension. Each of these 1 dimensional DFTs can be computed efficiently owing to the properties of the transform. This class of algorithms is known as the Fast Fourier Transform (FFT). We introduce the one dimensional FFT algorithm in this section, which will be used in our GPU implementation. Our implementation does an FFT transform in the row major dimension of a given three dimensional matrix at a time. Thus, the complete 3D FFT is a set of 1D FFT kernels and transpose kernels which bring a desired coordinate axis to the row major format to enable coalesced global reads. The implemented kernel performs a single precision 1D FFT and uses the fast math functions for calculating the sin and cos of the phases corresponding to twiddle factors. The GPU used is the C2050 Fermi GPU on the DIRAC cluster on the Carver system provided by the National Energy Research Scientific Computing Center. The NVIDIA Tesla C2050 has 448 parallel CUDA processor cores with 3 GB of memory.

Tags: Algorithms, Computer science, CUDA, FFT, nVidia, Tesla C2050

January 15, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Three Dimensional Fast Fourier Transform CUDA Implementation

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Three Dimensional Fast Fourier Transform CUDA Implementation

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)