Facial Feature Tracking with SO-CLM

Facial Feature Detection and Tracking with Approximated Structured Output Learning based Constrained Local Model

Shuai Zheng1  Paul SturgessPhilip H. S. Torr1

1Brookes Vision Group, (Now Torr-Vision at University of Oxford)


An approximated structured output learning approach is developed to learn the appearance model from CLM overtime on low-powered portable devices, such as iPad 2, iPhon 4S, Galaxy S2.

Facial feature detection and tracking are very important in many applications such as face recognition, and face animation.



What has been done

Existing facial feature detectors like tree-structured SVM, Constrained Local Model (CLMs) achieve the state-of-the-art accuracy performance in many benchmarks (e.g. CMU MultiPIE, BioID etc.). However, when it comes to the low-powered device application, the trade-off among accuracy, speed and memory cost becomes apparently the main concern in the facial feature detection related application.


How to make facial feature detection efficient in speed and memory

There are two ways to address the speeding up problem. One is to use GPU(e.g. CUDA), and parallel computing (e.g. OpenMP) techniques to speed up the existing algorithms (e.g. AAM, CLM etc). Another is to improve the steps inside existing algorithms, or let’s say developing a new algorithm. In this paper, we explored how to speed up the facial feature detection with an approach called approximate structured output learning for constrained local model.


What we did

Within this paper we examine the learning of the appearance model in Constrained Local Models (CLM) technique. We have two contributions: firstly we examine an approximate method for doing structured learning, which jointly learns all the appearances of the landmarks. Even though this method has no guarantee of optimality we find it performs better than training the appearance models independently. This also allows for efficiently online learning of a particular instance of a face. Secondly we use a binary approximation of our learnt model that when combined with binary features, leads to efficient inference at runtime using bitwise AND operations. We quantify the generalization performance of our approximate SO-CLM, by training the model parameters on a single dataset, and testing on a total of five unseen benchmarks.

The speed at runtime is demonstrated on the ipad2 platform. Our results clearly show that our proposed system runs in real-time, yet still performs at state-of-the-art levels of accuracy.


[1] S. Zheng, P. Sturgess, and P. Torr, “Approximate Structured Output Learning for Constrained Local Models with Application to Real-time Facial Feature Detection and Tracking on Low-power Devices”, IEEE Conference on Automatic Face and Gesture Recognition (AFGR) , 2013.[[pdf][bib][poster][ppt][ResultsFig][IEEExplore]


[1] Demo Program. [7.2M Win64ExcutableProgram][Win32][Linux][Mac]

FAQ: you can get the detection results by type “BrookesFaceTracker.exe gaga.png”, you can get demo by type”BrookesFaceTracker.exe”.


Frontal Facial landmark Annotation Dataset [link]


This project is supported by EPSRC EP/I001107/1.


Related Links

[1] S. Hare, A. Saffari, and P. Torr, “Efficient Online Structured Output Learning for Keypoint-based Object Tracking“, CVPR, 2012.[paper&C++code]

[2] X. Zhu, D. Ramanan, “Face Detection, pose estimation and landmark localization in the wild“, CVPR, 2012. [project]

[3] Struct SVM. [project]

[4] Struct SVM in Matlab.[Project]

[5] Flandmark. [project]

[6] CI2CV. [Website]

[7] CLM-Wlid [Website].

[8] FacePlusPlus[Website]

Data Links

[1] BioID http://www.bioid.com/index.php?q=downloads/software/bioid-face-database.html

[2] CMU MultiPie http://www.flintbox.com/public/project/4742/

[3] XM2VTS http://www.ee.surrey.ac.uk/CVSSP/xm2vtsdb/

[4] LFPW http://www.kbvt.com/LFPW/

[5] Talking face http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html

[6]300 Faces in-the-wild [i-Bug webiste]