high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

A. Hirano, K. Nakayama

Kanazawa University

Proc. of 27th SIP Symposium

@inproceedings{hirano2012efficient,

title={Efficient Implementation of RLS-Based Adaptive Filterson nVIDIA GeForce Graphics Processing Unit},

author={Hirano, Akihiro and Nakayama, Kenji},

booktitle={第 27 回信号処理シンポジウム講演論文集= Proc. of 27th SIP Symposium},

number={2012},

pages={241–245},

year={2012}

}

Download (PDF)

View

Source

2748

views

This paper presents efficient implementation of RLS-based adaptive filters with a large number of taps on nVIDIA GeForce graphics processing unit (GPU) and CUDA software development environment. Modification of the order and the combination of calculations reduces the number of accesses to slow off-chip memory. Assigning tasks into multiple threads also takes memory access order into account. Multiple shader processor arrays are used to handle a large matrix. For a 8192-tap case, a GPU program is almost 30-times faster than a CPU program. Real-time processing is possible for an 8kHz-sampling and 512-tap case by using 32 shader processors, which is only 25% of GeForce 8800GTS.

Tags: CUDA, Filtering, nVidia, nVidia GeForce 8800 GTS, Signal processing

September 5, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)