high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Guohui Wang, Blaine Rister, Joseph R. Cavallaro

Department of Electrical and Computer Engineering, Rice University, Houston, Texas

1st IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2013

@article{wang2013workload,

title={Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone},

author={Wang, Guohui and Rister, Blaine and Cavallaro, Joseph R},

year={2013}

}

Download (PDF)

View

Source

3676

views

Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent developments in mobile processors have enabled heterogeneous computing on mobile devices, such as smartphones and tablets. In this paper, we present an OpenCL-based implementation of the SIFT algorithm on a smartphone, taking advantage of the mobile GPU. We carefully analyze the SIFT workloads and identify the parallelism. We implemented major steps of the SIFT algorithm using both serial C++ code and OpenCL kernels targeting mobile processors, to compare the performance of different workflows. Based on the profiling results, we partition the SIFT algorithm between the CPU and GPU in a way that best exploits the parallelism and minimizes the buffer transferring time to achieve better performance. The experimental results show that we are able to achieve 8.5 FPS for keypoints detection and 19 FPS for descriptor generation without reducing the number and the quality of the keypoints. Moreover, the heterogeneous implementation can reduce energy consumption by 41% compared to an optimized CPU-only implementation.

Tags: ARM, Computational Complexity, Computer science, Computer vision, OpenCL, SIFT

September 20, 2013 by hgpu

Rating: 2.5/5. From 4 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)