E27 Lab 2: Real Time Object Recognition
Ben Mitchell, Zach Pezzementi, Dan Crosta

Abstract
     In this lab, we created a system which takes the real-time input from a camera and compares objects in the view field to a database of training objects in order to classify the object. After thresholding and segmentation, we used second order moment, radial projections, ratio of perimeter length to object area, and bending energy as features to model an object. This model was compared with a database of known models, and the relative similarity was used to assign a label to the novel object.

Thresholding and Segmentation
     Our thresholding for object detection is a simple, fixed threshold based on the grayscale value (calculated by summing the three color values). A good value was determined empirically. We used the segmentation algorithm provided in Bruce Maxwell's segment.c. We also used a grassfire transform to grow and then shrink the images in order to close small open spaces inside of objects created by specularities or light-colored features (eg. the word on the stopsign).

Feature Extraction
     We used four features in our final model. These features were second order moment, radial projections, ratio of perimiter length to object area, and bending energy. We discovered that the bending energy is distorted by aliasing, and were forced to downweight it in our
compairison function. We also attempted to implement percent of an oriented bounding box, as well as the aspect ratio of an oriented bounding box. However, our algorithm consistently failed to generate correct values, and we were unable to fix it, so we ended up not using it. The code is still in imageFuncs.c, but the function to calculate an oriented bounding box is never called.

Modeling and Database
     Our model contains an object name, and a feature set (see Feature Extraction). There may be any number of objects in the database with the same name. Our database is simply a set of objects, each with its own name and feature set, saved into a file. A new object may be added to a database by calling:
         image <image to add object from> /dev/null add <object name>
There is a perl script in the src directory which generates a database (it assumes that the images are in ../images/); it merely calls image...add for each file in the images directory, it strips the filename to get an object name, and adds it to the database. Most of the code for the database is in objectdb.c and objectdb.h.
     It is worth note that we were added the clip to the database because we had an image, but we were unable to test our system's ability to identify the clip, because we were unable to find the actual clip.

Object Comparison and Identification
     Objects were compared by computing the percent similarity for each feature, and then combining these numbers in a weighted and normalized fashion. For example, the bending energy of object 1 would be divided by the bending energy of object 2 (or the inverse, depending on which was bigger). Then, it would be multiplied by a fixed weight; this would be done for each feature. After this, the weighted similarities are multiplied together, and devided by the quotient of the weights to normalize the value to a percent. A similarity of 1 means a perfect match, and implies that the image being tested is probably already in the database. A similarity of 0 means that objects are perfectly dissimilar, which is theoretically impossible. The weights were set based on inspection of the consistency (reliability) of the given feature for an object; the features which proved to have greater variance within objects or less variance across objects were downweighted relative to the other features, which contain more information.

Extensions
     Our object recognition algorithm does not rely on color, and would work just as well on a black and white image. Even without color, it can correctly identify all of the objects it has been trained on, other than objects differentiated only by color (eg. the black and red disks). Also, our algorithm performs the recognition in real time, outputting what it thinks each object in the image is for every frame captured to standard out. The frame rate is generally between 15 and 20 frames per second, which is noticably slower than that of the disp program, but is still fast enough to be useful. We use bending energy as a feature, but we don't use chain codes directly because the aliasing on these images would render such a feature practically useless; it is not close enough to rotation invarient to be meaningful. We also used grassfire transformations to do growing and shrinking.

Results
     The confusion matrix has 10 tests per image, which means that the percent correct can be easily calculated by examining the diagonal.


Example images:

blackDisk

redDisk

clip

cone

diskCase

pen

pliers

stopsign

jumpingToy

ethernetCase

bottleCap

cat

sunglasses

tapeDispenser

wireCutters



before grassfire transform

after grassfire grow

after grassfire shrink

major axis and perimeter

radial projection lines


obj in \ result black disk red disk clip cone disk case pen pliers stop sign jumping toy ethernet case sunglasses wire cutters bottle cap tape dispenser cat reject
black disk 2 2 1 5
red disk 5 3 1 1
clip
cone 10
disk case 2 3 5
pen 7 1 2
pliers 10
stop sign 9 1
jumping toy 10
ethernet case 10
sunglasses 10
wire cutters 10
bottle cap 10
tape dispenser 10
cat 10
reject 1 1 1 3 4