Master’s thesis: Facial Landmark Detection and Shape Modeling using Neural Networks

I have written my master’s thesis as an exchange student at Carnegie Mellon University (CMU) in Pittsburgh, PA, USA. This was possible thanks to the CLICS exchange program that allows KIT students to visit partner universities. At CMU I was working with the MultiComp group which belongs to the Language Technologies Institute (LTI), part of the School of Computer Science.

My thesis aimed at improving the precition accuracy for the task of facial landmark detection. The English abstract:

Facial landmarks are distinctive points in human faces that are used for a variety of tasks such as facial expression analysis, lip reading or face recognition. The performance on these tasks depends heavily on the accuracy of the detected facial landmarks. It is challenging to accurately locate facial landmarks even on faces that are partially occluded by glasses, facial hair or other objects. In this work we introduce a new approach to tackle these challenges on unconstrained frontal and semi-frontal face images. The proposed solution is a new deep learning based algorithm that is built on the Stacked Hourglass Network which has proven to be effective for human pose estimation, a task similar to facial landmark detection. The algorithm processes face images by repeatedly down- and upsampling the image and thus analyzes it on multiple scales. The Stacked Hourglass Network is trained using Wing loss and regresses coordinates using a Differentiable Spatial To Numerical Transform. Our algorithm is able to outperform current state-of-the-art solutions on the
300-W and Menpo datasets in terms of the point-to-point normalized error. Additionally, a neural Point Distribution Model is employed as a shape model that refines the predictions made by the Stacked Hourglass Network. By adding the Point Distribution Model, the prediction error on the inner facial landmarks of the challenging test set of 300-W reduces
even more. The Point Distribution Model achieves the biggest improvements on the inner landmarks of faces with strong head poses while improving the predictions of landmarks on the outline is more challenging.

The whole thesis can be found as a PDF here. The code is available here. As this is research code, don’t expect proper documentation and very clean code. 😉