Yesterday my wife was shopping for lipstick, and she saw an interesting feature on the manufacturer's website: it shows you how a shade of lipstick would look on you.
Using your webcam, you can see a video of yourself, and when you select a shade, your face instantly wears that same shade of lipstick. Time to try on that black one for Halloween!
This cosmetics website is a good example of how a simple-looking application can bring together sophisticated pieces of software from different fields of computer science.
The first step for the app is to detect your face.
Even as babies, our own brains can quickly detect faces, but it is hard for a computer program to do the same in a digital picture made of hundreds of thousands of dots. Simple methods don't work. The lighting could be poor, the faces could be big or small, far away, and looking in any direction, and they could be partly obscured.
Face detection has only recently become practical, for the usual reasons: computers have become fast enough, and newer software techniques in computer vision can make use of the hardware. Even your phone camera can now detect faces for focusing on human subjects.
The website can perform the operation so quickly that as you move your face this way and that, it is able to track your face within the webcam image.
Within the face, a different software program now locates a few dozen "landmarks": the ends and centers of your eyebrows, the line of your nose from the bridge to the tip, the centers of your pupils, and so on.
The purpose of locating these facial landmarks is twofold. First, it helps the software to estimate the pose of your head, i.e., which way your head is turned. For example, if the camera sees your right eye move toward the left eye, reducing the distance between your pupils, it can guess that you have turned your head a little to look toward your left.
Second, it can estimate the shape of your face in three dimensions: like a molded rubber mask, as opposed to a flat picture.
Several hundred pictures of actual human faces have had similar landmarks marked on them. Using these varied faces, the software has been programmed with a 3-D shape model of a face. This model can be varied to match your facial landmarks, the way a police sketch artist can try to match a suspect based on a witness's description.
The program now fits the shape model to your unique facial landmarks. As you move your head, the model rotates and aligns to match your current pose. The shape model knows that your face can distort in certain ways: for example, you could purse your lips, open your mouth, etc. If you do any of these things, the software discovers that some of the landmarks on your face have left their original positions. It quickly varies the model to match your current expression.
These kinds of operations are studied in a field called geometric modeling.
The software creates geometric surfaces in the shape of your lips and cloaks the appropriate part of your face model with these surfaces.
Now it only remains to choose the appropriate tint and texture for the lipstick swatch. A third program calculates how these colors would look on your lips, and superimposes the appropriate colored dots on your webcam image.
The science of generating ("rendering") images from 3-D models is called computer graphics or CG.
What you see is a combination of your actual photographic image and this synthetic swatch. This kind of mixed-mode experience is referred to as augmented reality or AR.
So, what looks like a webcam video is an elaborate dance of computer vision, geometric modeling, and computer graphics, all for a quick makeover.