high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

Seung Heon Kang, Seung-Jae Lee, In Kyu Park

Department of Information and Communication Engineering, Inha University, Incheon 402-751, Korea

International Workshop on Advanced Image Technology (IWAIT’14), 2014

BibTeX

Download (PDF)

View

Source

2472

views

In this paper, we parallelize and optimize the popular feature detection algorithms, i.e. SIFT and SURF, on the latest embedded GPU. Using conventional OpenGL shading language and recently developed OpenCL as the GPGPU software platforms, we compare the implementation efficiency and speed performance between each other as well as between GPU and CPU. Experimental result shows that implementation on OpenCL is more efficient but has comparable performance with OpenGL. Compared with the performance on the embedded CPU in the same application processor, the embedded GPU runs 4-5 times faster. Furthermore, we measure and compare the power consumption on each implementation, which shows that OpenCL consumes less energy than OpenGL.

Tags: Algorithms, ARM, Computer science, Computer vision, OpenCL, OpenGL, SIFT

January 14, 2014 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)