high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating Boosting-based Face Detection on GPUs

Accelerating Boosting-based Face Detection on GPUs

David Oro, Carles Fernandez, Carlos Segura, Xavier Martorell, Javier Hernando

Herta Security, Barcelona, Spain

41st International Conference on Parallel Processing, 2012

@article{oro2012accelerating,

title={Accelerating Boosting-based Face Detection on GPUs},

author={Oro, David and Fernandez, Carles and Segura, Carlos and Martorell, Xavier and Hernando, Javier},

year={2012}

}

Download (PDF)

View

Source

1972

views

The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different divide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core processors to date are still struggling to effectively handle realtime face detection under high-definition video workloads. To address this challenge, face detection algorithms typically avoid computations by dynamically evaluating a boosted cascade of classifiers. Unfortunately, this technique yields a low ALU occupancy in architectures such as GPUs, which heavily rely on large SIMD widths for maximizing data-level parallelism. In this paper we present several techniques to increase the performance of the cascade evaluation kernel, which is the most resource-intensive part of the face detection pipeline. Particularly, the usage of concurrent kernel execution in combination with cascades generated with the GentleBoost algorithm solves the problem of GPU underutilization, and achieves a 5X speedup in 1080p videos on average over the fastest known implementations, while slightly improving the accuracy. Finally, we also studied the parallelization of the cascade training process and its scalability under SMP platforms. The proposed parallelization strategy exploits both task and data-level parallelism and achieves a 3.5X speedup over single-threaded implementations.

Tags: Algorithms, Computer science, Computer vision, CUDA, H.264/AVC, nVidia, nVidia GeForce GTX 470

September 10, 2012 by hgpu

Rating: 2.2/5. From 6 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Accelerating Boosting-based Face Detection on GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Accelerating Boosting-based Face Detection on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)