Blog: Leveraging neural networks to create facial recognition mobile apps

Application Development, Mobile

The eyes may be the window to the soul, but your face is the key for your phone. While facial recognition may have once been relegated to the world of science fiction, it is now our reality and plays such a common role in our everyday lives that few bother to think about the enormous technology and infrastructure behind it. For instance, Apple’s FaceID technology on their iPhone X utilizes their ‘TrueDepth’ camera system that projects over 30,000 dots onto a user in order to map the geography of the user’s face. But Apple isn’t the only company that employs facial recognition, Samsung and Facebook use it to unlock devices and tag people in photos. Our development team has always been fascinated with the concept of facial recognition and so they decided to tackle this once-mythical technology and put it to work for our customers.

How do facial recognition systems work?

Our previous experience with facial recognition solutions was fairly simple; it revolved around tracking user’s faces and detecting smiles and blinks. We have now gone further and created an application that not only detects faces but matches their photo against a client’s database. But how did we do it?

Facial recognition works by measuring the unique characteristics of a user’s face and then comparing the source data with an existing database of entries. So, in order to perform the biometric authentication, you need to represent the user’s face as a multidimensional array of numbers. Here’s how we approached this task.

We use a picture, or even a video frame, as the source input. For the first step, we need to determine the part of the image with the face on it. Modern mobile operating systems provide developers with the built-in tools to detect a face on a picture. However, we leveraged a neural network for this task because it delivers better results and faster image processing compared to the typical iOS tools.

Preparing the source image

After we calculate the coordinates of the face’s position and its size, we crop the original image. During testing, we estimated that we needed to add an additional frame (5 to 10 percent of the originally detected face size) to get better recognition results. Then, we croped the image and scaleed it to the reasonable size. Once again, we defined empirically that we need to scale the image down to the resolution of 300×300 pixels. In our tests, we confirmed that the images with greater resolution take longer to process, but don’t deliver any qualitative improvements.

Detecting user’s face landmarks

On the second step, we used the cropped and resized images as an input for the second neural network. It extracts the face landmarks, which are a 128-dimensional array that exhaustively describes the user’s face. This multidimensional array stays the same even when a user smiles, wears glasses, or adds makeup. After we extract these 128 numbers, we can compare them with the face landmarks of other users. The picture below shows how these landmarks are placed on the user’s face and how they are stored in the application.

facial recognition face landmarks

Comparing users

The comparison operation determines the similarity between the input face landmarks and the landmarks of other users, stored in the database. This process is called face verification.
Since the landmarks are stored in the multidimensional arrays, we can use the Euclidean distance to measure the similarity of two faces. Along with this, we need to set up a certain threshold which will confirm that the images of the two faces belong to the same person. The lower the threshold, the more accurate the results will be. We set the threshold at 0.3. This ensures a remarkable 99.8% accuracy of the face recognition. The following code shows the procedure that we use to find the matching faces.

1
2
3
4
5
6
7
8
9
std::vector edges;
for (size_t i = 0; i < face_descriptors.size(); ++i)
{
for (size_t j = i; j < face_descriptors.size(); ++j)
{
if (length(face_descriptors[i]-face_descriptors[j]) < 0.3)
edges.push_back(sample_pair(i,j));
}
}

Technological best practices

As a result, we crafted 2 facial recognition mobile applications, both for iOS and Android. We used an open-source dlib library and two already trained neural networks to find the faces on the pictures and extract the face landmarks.

The iOS app uses a cloud application server as a back-end. In this case, the face recognition speed is blazing fast, and the application can deliver the result in real-time, but the application requires an Internet connection. Our Android application allows for recognizing users locally, without an Internet connection. We use the CPU power to execute all the operations speedily, but we can opt for using the multiple GPU cores to improve the performance.

Today, this application is being used to recognize DB Best employees. However, we are constantly looking for new ways to update and improve the capability of all of our applications. By integrating in with the Facebook or LinkedIn API we have access to a wide-range of facial recognition technology and samples. In a future update, we are planning to add Optical Character Recognition (OCR) capabilities to correctly identify users by scanning their conference badge or business card.

But what’s the point of facial recognition?

As we came to the end of our research, we started to imagine just how far we could go with our facial recognition technology. Facial recognition is quickly becoming a key feature of mobile technology and a critical element of different applications. Imagine, being able to unlock your car with your face and then have it automatically adjust seat, wheel height, or even radio settings based on the user? What about determining just who is sitting at a family PC, and then running specific ads based on who they are? But what about public safety and security? Determine who can, and cannot, enter a business’ conference room or detect a criminal out of a crowd of thousands of people? The possibilities are endless and the DB Best application development team are ready to put their skills to good use.

Today, this technology is primarily used for security purposes, but that’s just the beginning. Imagine how we can enhance the way that we interact with everyday objects just by adding a dash of facial recognition. Pablo Picasso once asked, “who sees the human face correctly: the photographer, the mirror, or the painter?”. Today, I’m happy to say that we can finally answer him — it’s the DB Best application development team. If you want to finally see faces correctly, get in contact with us today!