Facial Recognition with Microsoft’s Project Oxford
Machine Learning has become more and more popular in the last years, especially as part of the new discipline of data science. In connection with this, several tools and APIs (Application Program Interface) on Machine Learning have been made available to non-experts. Microsoft, for instance has developed the Azure Machine Learning suite and has made several APIs available on the Cortana Analytics Gallery that allow developers to work on Machine Learning with relative ease.
One of the APIs made available on the Cortana Analytics Gallery is the Face API from Microsoft’s Project Oxford. Project Oxford consists of a collection of machine-learning based APIs which deal with computer based vision, speech recognition and natural language processing. The Face API is able to detect and recognize human faces from an image.
Our aim was to use the Face API for an app that can detect people in a meeting room from a video feed and recognize these people from a collection of reference images. If these reference images were to be coupled to additional details about the person, these could then be brought up on the screen.
How our Facial recognition app works…
In the app we created, faces are detected from a live webcam feed. With the press of a button, a frame is captured as an image and is then, through the API, uploaded to the service. This returns a face object, which contains over two dozen detected facial ‘landmarks’, like positions of eyes, nose, eyebrows and lips, along with attributes like age, gender, details on facial hair and whether the person in the image is smiling.
If the API is given two separate face objects, it will compute the similarity between the two faces and can then determine if those two faces belong to the same person. It will return a confidence value, between 0 and 1, which quantifies the similarity between two people. For values above 0.5 two faces are marked as belonging to the same person and for values below that there is no match between two faces.
In our app the still frame image of the webcam feed is then compared to a set of reference images, which were processed beforehand. We then display the faces which were detected in the captured image alongside the matched face in the reference set and the confidence value.
The detection of faces from the captured image works very well, and only misses faces in the image if they are half obscured, side-on or looking away from the camera or are too small. Project Oxford gives a 36 pixel face size as minimum size for a face to be detected, which we found to be a reasonable estimate.
The identification of faces also works quite well and manages to identify people very consistently. The matching algorithm seems to err on the side caution when matching two face images belonging to the same person, but this is sporadic. This might partially be due to the low quality webcam we’ve used so far.
The Project Oxford Face API gives us a very reliable facial recognition tool and shows computer vision tools based on machine learning are starting to mature. Now that more and more tools become available to implement not only computer vision, but also speech recognition and natural language processing, new possibilities will open up for developers and data scientists to automate processes that could before only be done by real people.
If you have any questions or can think of some suggestions, please comment below.
Written by Yorick Boheemen | Consultant, RedPixie | See his LinkedIn Profile