E27 Lab 2: Real Time Object Recognition

E27 Lab 2: Real Time Object Recognition
Ben Mitchell, Zach Pezzementi, Dan Crosta

Abstract
     In this lab, we created a system which takes the real-time input from a camera and compares objects in the view field to a database of training objects in order to classify the object. After thresholding and segmentation, we used second order moment, radial projections, ratio of perimeter length to object area, and bending energy as features to model an object. This model was compared with a database of known models, and the relative similarity was used to assign a label to the novel object.

Thresholding and Segmentation
     Our thresholding for object detection is a simple, fixed threshold based on the grayscale value (calculated by summing the three color values). A good value was determined empirically. We used the segmentation algorithm provided in Bruce Maxwell's segment.c. We also used a grassfire transform to grow and then shrink the images in order to close small open spaces inside of objects created by specularities or light-colored features (eg. the word on the stopsign).

Feature Extraction
     We used four features in our final model. These features were second order moment, radial projections, ratio of perimiter length to object area, and bending energy. We discovered that the bending energy is distorted by aliasing, and were forced to downweight it in our compairison function. We also attempted to implement percent of an oriented bounding box, as well as the aspect ratio of an oriented bounding box. However, our algorithm consistently failed to generate correct values, and we were unable to fix it, so we ended up not using it. The code is still in imageFuncs.c, but the function to calculate an oriented bounding box is never called.

Modeling and Database
     Our model contains an object name, and a feature set (see Feature Extraction). There may be any number of objects in the database with the same name. Our database is simply a set of objects, each with its own name and feature set, saved into a file. A new object may be added to a database by calling:
         image <image to add object from> /dev/null add <object name>
There is a perl script in the src directory which generates a database (it assumes that the images are in ../images/); it merely calls image...add for each file in the images directory, it strips the filename to get an object name, and adds it to the database. Most of the code for the database is in objectdb.c and objectdb.h.
     It is worth note that we were added the clip to the database because we had an image, but we were unable to test our system's ability to identify the clip, because we were unable to find the actual clip.

Object Comparison and Identification
     Objects were compared by computing the percent similarity for each feature, and then combining these numbers in a weighted and normalized fashion. For example, the bending energy of object 1 would be divided by the bending energy of object 2 (or the inverse, depending on which was bigger). Then, it would be multiplied by a fixed weight; this would be done for each feature. After this, the weighted similarities are multiplied together, and devided by the quotient of the weights to normalize the value to a percent. A similarity of 1 means a perfect match, and implies that the image being tested is probably already in the database. A similarity of 0 means that objects are perfectly dissimilar, which is theoretically impossible. The weights were set based on inspection of the consistency (reliability) of the given feature for an object; the features which proved to have greater variance within objects or less variance across objects were downweighted relative to the other features, which contain more information.

Extensions
     Our object recognition algorithm does not rely on color, and would work just as well on a black and white image. Even without color, it can correctly identify all of the objects it has been trained on, other than objects differentiated only by color (eg. the black and red disks). Also, our algorithm performs the recognition in real time, outputting what it thinks each object in the image is for every frame captured to standard out. The frame rate is generally between 15 and 20 frames per second, which is noticably slower than that of the disp program, but is still fast enough to be useful. We use bending energy as a feature, but we don't use chain codes directly because the aliasing on these images would render such a feature practically useless; it is not close enough to rotation invarient to be meaningful. We also used grassfire transformations to do growing and shrinking.

Results
     The confusion matrix has 10 tests per image, which means that the percent correct can be easily calculated by examining the diagonal.

Example images:

blackDisk	redDisk	clip	cone	diskCase
pen	pliers	stopsign	jumpingToy	ethernetCase
bottleCap	cat	sunglasses	tapeDispenser	wireCutters

before grassfire transform

after grassfire grow

after grassfire shrink

major axis and perimeter

radial projection lines

obj in \ result	black disk	red disk	clip	cone	disk case	pen	pliers	stop sign	jumping toy	ethernet case	sunglasses	wire cutters	bottle cap	tape dispenser	cat	reject
black disk	2	2								1				5
red disk	5	3												1		1
clip
cone				10
disk case	2			3	5
pen						7	1									2
pliers							10
stop sign								9								1
jumping toy									10
ethernet case										10
sunglasses											10
wire cutters												10
bottle cap													10
tape dispenser														10
cat															10
reject			1						1				1	3		4