high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Development of Parallel Architectures for Radar/Video Signal Processing Applications

Development of Parallel Architectures for Radar/Video Signal Processing Applications

Amin Jarrah

University of Toledo

University of Toledo, 2014

@phdthesis{jarrah2014doctor,

title={Development of Parallel Architectures for Radar/Video Signal Processing Applications},

author={Jarrah, Amin},

year={2014},

school={The University of Toledo}

}

Download (PDF)

View

Source

2457

views

The applications of digital signal processing continue to expand and use in many different areas such as signal processing, radar tracking, image processing, medical imaging, video broadcasting, and control algorithms for sensor array processing. Most of the signal processing applications are intensive and may not achieve the real time requirements. However, the Multi-core phenomenon has been embraced by almost all processor manufacturers and the road to the future is through parallel processing. Now we have many parallel processing platforms that developed for high performance such as: 1) Multi-Core/Many-Cores, 2) Graphic Processing Units (GPU), 3) Field Programmable Gate Arrays (FPGA). This research work involves developing optimized parallel architectures of many signal processing applications such as Extensive Cancellation Algorithm (ECA), Direct Data Domain (D3), Block Compressive Sampling Matching Pursuit algorithm (BCoSaMP), video processing, Discrete Wavelet Transform (DWT), Particle Filter (PF), and Iterative Hard Thresholding (IHT) on different platforms such as Multi-core, FPGA and GPU. This is performed by exploring opportunities of any computation and storage that can be eliminated to achieve high performance and meet its real time requirements. Different techniques and ideas have also been derived from different advanced fields to increase the intelligibility and the usefulness of our research. A new innovative generalized method is proposed which can be very helpful for many researchers in various areas. Then, the applications have been moved higher ordering through implementing interfaces. This makes it adaptable by specifying all the input parameters of a certain application and fast prototyping through different performance evaluations. We propose and exploit many parallelization methods and optimization techniques in order to improve the latency, hardware usage, power consumption, cost, and reliability. These parallelization methods predict the data path and the control unit of the application processes. Also, the applications examine into numerical algorithms approaches to provide a transition from the research theory to the practice and to enhance the computational and resource requirements by adapting the certain algorithm for high performance applications. We exploit techniques coupled with high level synthesis tools by enabling rapid development to generate efficient parallel codes from high-level problem descriptions. This will reduce the design time, increase the productivity, improve the reliability, and enable exploration of the design space. Approaches will include optimizations based on mathematical and/or statistical reasoning, set theory, logic, and auto-tuning techniques. Hardware software co-design for these applications has been performed that pushes performance and energy efficiency while reducing cost, area, and overhead. This has been accomplished by developing a tool called Radar Signal Processing Tool (RSPT). RSPT allows the designer to auto-generate fully optimized VHDL representation of any of these signal processing algorithms by specifying many user input parameters through Graphic User Interface (GUI). This will offer great flexibility in designing signal processing applications for a System on Chip (SoC) without having to write a single line of VHDL code. RSPT also communicates with Xilinx toolset to check for the available FPGA parts installed with the Xilinx toolset and for executing the VHDL synthesis command chain. Moreover, it utilizes optimization techniques such as pipelining, code in-lining, loop unrolling, loops merging, and dataflow techniques by allowing the concurrent execution of operations to improve throughput and latency. Finally, RSPT provides the designer a feedback on various performance parameters such as occupied slices, maximum frequency, and dynamic range. This offers the designer the ability to make any adjustments to the algorithm component until the desired performance of the overall SoC is achieved. Parallel approach of IR Video processing is also proposed as it widely used in many numerous processing applications and not achieve the real time requirements. Analysis and assessment of the energy dissipation for heterogeneous Network on Chip (NoC) based Multiprocessor System on Chip (MPSoC) platform running a video application are performed. It identifies the latency, area, and energy bottlenecks of the entire heterogeneous platform including processors, interconnection wires, routers, memory, and caches etc. Also, we propose a new modeling and simulation approach regarding the channel width and buffer sizing which have a strong impact on the performance and the overhead of the chip. This approach monitors the state of each link in the NoC topology. Then, based on the congestion spot and the critical path we can optimize the design by changing channel width and buffer size until achieving the desired performance.

Tags: Algorithms, CUDA, Discrete Wavelet Transform, FPGA, Heterogeneous systems, Image processing, nVidia, nVidia GeForce GTX 260, Signal processing, Thesis

May 12, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Development of Parallel Architectures for Radar/Video Signal Processing Applications

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Development of Parallel Architectures for Radar/Video Signal Processing Applications

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)