Finding a robust yet compact representation of an object is a difficult task in computer vision. One strategy is to try to isolate the class of images of an object from the set of all images. If images are treated as vectors, the universal image space is the set of all possible images. The task is then to find the subspace of this image space that best represents the object. For this particular application this is called the facespace.
For this system, each image must be a standard size, centered on the model object, and normalized. Additional preprocessing involves composing all of the model images into a single average image. This average image is subsequently subtracted from each model image to create a difference image. An SVD routine then identifies the the best eigenvectors for the face space. This routine takes as inputs a matrix whose columns are the difference images, and outputs a matrix with each of the eigenvectors. The eigenvectors with the greatest eigenvalues are then selected to form the basis of the facespace. The graph below shows the singular values (the square root of the eigenvalues) of the eigenvectors. The eigenvalues are on the y-axis, and the order(greatest to smallest) is on the x-axis. Based on this graph 20 vectors were chosen.

Here are the top five eigenvectors.
|
|
|
|
|
To project the difference image onto the facespace, the image is dotted with the basis vectors of the face space to produce a set of 20 coefficients. Each test image was then also dotted with the basis vectors to produce its own set of coefficients. The Euclidean distance between these coefficients and those of each model was then calculated to determine which model most resembled the test image.
To determine whether an image is a face at all, the distance between the image and its projection onto the face space was calculated. A large distance indicates that the image is not a face.
Shown below is the confusion matrix for the set of images taken on the same day as the model images, along with several non-face images. The labels on the columns are the same as the labels on the rows.
Brandon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Charlie 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Dave 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Eli 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Eric 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Evan 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Jane 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Jesse 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Jordan 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Kuan 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 Laura 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Luis 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Matt 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 Maxwell 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Nik 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 Paul 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 Stephanie 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Suor 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 Non-face 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4The system correctly identified all of the images in this set.
Input Image |
Reconstructed Image |
Matched To |
|
|
|
|
|
|
|
|
|
Below is the confusion matrix for a set of picture taken on another day, when the subjects were wearing different clothes and in some cases have different hair.
Brandon 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 Charlie 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Dave 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Eli 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Eric 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Evan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Jane 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Jesse 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Jordan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kuan 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 Laura 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Luis 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Matt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Maxwell 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 1 Nik 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 Paul 0 1 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 Stephanie 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 Suor 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 1 0 Non-face 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Input Image |
Reconstructed Image |
Matched To |
|
|
|
|
|
|
|
|
|
The system worked well for the images that were taken with the same hair and clothing as the model images. However, the results were poor for those images in which these characteristics had changed. The system could be improved by setting all pixels below the chin and above the forehead to a uniform gray. Then, the matches would focus on facial features. 54 images is also a small set to form a face space. Training on more pictures with more expressions and clothing variety would make the system more robust.