8179

Accelerating Boosting-based Face Detection on GPUs

David Oro, Carles Fernandez, Carlos Segura, Xavier Martorell, Javier Hernando
Herta Security, Barcelona, Spain
41st International Conference on Parallel Processing, 2012

@article{oro2012accelerating,

   title={Accelerating Boosting-based Face Detection on GPUs},

   author={Oro, David and Fernandez, Carles and Segura, Carlos and Martorell, Xavier and Hernando, Javier},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

1972

views

The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different divide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core processors to date are still struggling to effectively handle realtime face detection under high-definition video workloads. To address this challenge, face detection algorithms typically avoid computations by dynamically evaluating a boosted cascade of classifiers. Unfortunately, this technique yields a low ALU occupancy in architectures such as GPUs, which heavily rely on large SIMD widths for maximizing data-level parallelism. In this paper we present several techniques to increase the performance of the cascade evaluation kernel, which is the most resource-intensive part of the face detection pipeline. Particularly, the usage of concurrent kernel execution in combination with cascades generated with the GentleBoost algorithm solves the problem of GPU underutilization, and achieves a 5X speedup in 1080p videos on average over the fastest known implementations, while slightly improving the accuracy. Finally, we also studied the parallelization of the cascade training process and its scalability under SMP platforms. The proposed parallelization strategy exploits both task and data-level parallelism and achieves a 3.5X speedup over single-threaded implementations.
Rating: 2.2/5. From 6 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: