E27/CS27 Lab 3: Content Based Image Retrieval

Dan Crosta
Ben Mitchell
Zach Pezzementi

Note: The file readme.cbir in the source directory contains information about building and running the project.

 

 

 

The first decision we made in our CBIR procedure was the color space over which our searches would operate. We chose CIE-LAB as our primary color space, due to CIE-LAB's relationship between euclidean distance to a human's perceptual of distance. So, the first task of all searches is the nonlinear conversion to this color space from RGB. Notably, only the chromaticity information is used for comparison. In an attempt to normalize for lighting conditions, we never use L. The program works by default in LAB, but can be modified to work in LUV with the change of a macro. The rest of this explanation refers to calculations based on LAB values, but LUV values could just as easily be substituted.

Our program calculates three fundamentally different characteristics for each image, and uses linear combinations of these as the two "methods" of image comparison. The three features are whole-image histograms (as whole-image histograms of A and B), spatial variance of colors, and color neighborness information. Once our program has computed the histogram representations of each of these files for the sample image, it compares it to histograms of all the database images using a simple histogram intersection method.

 

Feature Details

Whole-Image Histograms

This measurement is a direct comparison of the chromaticity (AB space) of the image. The A and B axes are each broken down into HIST_SIZE buckets (defined in "cbirFuncs.h"), and an accumulator keeps track of the number of pixels in each bucket of each dimension. Each of these histograms (one for A, one for B) is then compared using longHistIntersect(...) with other A or B whole-image histograms from the database. Finally, the amount of overlap is divided by the total size of the larger of the two histograms in question, so that the result is a double between 0 and 1, with 1 being a perfect match and 0 being no match (all our intersection functions return values like this).

Spatial Variance of Colors

In order to calculate spatial variance and color neighborness of colors, we found that it would be easier to somehow "compress" the information about a pixel's position in AB space into one dimension. Toward this end, we created a co-occurrence matrix of the LAB values of all the possible RGB combinations (dividing the continuous space along the A and B axes into BUCKETS_A and BUCKETS_B buckets, respectively). Traversing this array, we assigned each bucket with a non-zero value with consecutive numbers, which then became the "color id" of a particular range of AB values.

We calculated the spatial variance as euclidean distance from the centroid for all occurrences of each color id, and then compared this information to images in the database. Spatial variance information across all colors can sometimes be misleading, since many of the images in the database have large background areas of more or less the same color (particularly outdoor photos of objects on the grass). To compensate, we attempted to allow the user to click on the dominant object of the image (in a GUI window), and then run a spatial variance histogram intersection only on the color id of the selected pixel, and nearby color ids (in AB space). The color ids by themselves don't provide much information about the similarity of the colors, so we had to locate nearby buckets in AB space, and get the color ids of those buckets (of the bucket in question and its 8-connected neighbors), construct mini-histograms of these 9 buckets, and compare these mini-histograms. We were unable to find a bug in the last step, and so our program only compares spatial variance across all colors, instead of the more informative isolated measure.

Color Neighborness

The color neighborness feature measures the likelihood of two color ids appearing as 8-connected neighbors in the image. We calculate a co-occurrence matrix based on a 3x3 mask applied to each pixel: if the mask is centered on a pixel with color id A, and color ids B and C appear within the mask, then the matrix values of A,B and A,C are incremented accordingly. The color neighborness histogram is then compared using two-dimensional histogram intersection function.

 

CBIR vs. Object Recognition

We found that CBIR was harder than object recognition, as far as accuracy is concerned (precision doesn't seem to have an analog in object recognition). One major difficulty of CBIR is the fact that there is less structure to the features we can construct (for instance, it wouldn't be possible to segment the images with a simple threshold, compute bounding boxes, and use that information to compute projections, moments, etc). Additionally, the search space is several orders of magnitude larger, which requires us think more carefully about which features we want to compute (because some, like color neighborness, require a lot of storage space) -- we should note that we attempted to implement a compressed database scheme, but were unable to complete it before deadline.

How good is histogram-based CBIR?

Our CBIR system had reasonably good accuracy for the sample images we ran it on, but fairly poor precision using both methods. On the other hand, we were unable to implement several promising features (especially the limited spatial variance method) which may have improved the precision of our system considerably.

What were the greatest causes of error in the CBIR task?

The greatest cause of error in the task is the essential ambiguity of the task itself. The nature of the correctness of correct classifications is rather subjective and difficult to quantify. As a result, methods will often do exactly what they are supposed to, yet still not yield "good" results for all test cases. It is very difficult to define algorithms that perform consistently well on a large variety of different types of images, and even more difficult to implement them. We experienced some errors due to aliasing, because in order to make our data structures conform to resource restrictions, we had to downsample our features, including the colorspace itself. Additionally, since many RGB values map to a relatively small area in the CIE-LAB space, our segmentation of the space into regions did not distribute the pixel counts for the various color buckets evenly. As mentioned above, the variance measurement is not as robust as we would have liked, since the entire image is analyzed rather than the area of primary interest.

Would the system you designed scale well to a very large data base (e.g. the web)?

Our system, which has allocated a database of a little over 1 gigabyte for under 1000 images at 640x512 pixels would not scale well to an image space as large as the internet (or even too much larger than the 831 images we have). However, the compressed database was projected to use about 8K per image, on average, which would greatly reduce the database size. On the other hand, the image processing time wouldn't change significantly with the compressed database: we would have to spend less time reading the images, but more time reinflating the compressed data to the full representation we need to compare images. On the Engin lab computers, it takes 3 to 4 minutes to compare an image to the database, and, assuming the search time scales linearly with the number of images, the delay would become prohibitive with a database of only a few thousand images.

Extensions

We implemented additional transforms for the RGB, XYZ, and LUV colorspaces, but did not include them in the database due to size constraints. One can switch between them by changing a #define, but we did not include results for anything but LAB.

We implemented measurements of spatial variance and color neighborness.

We attempted, but failed, to implement a GUI to isolate the area of interest of a search image for calculation of a more useful spatial variance feature, within the allotted time.

I did research to try to do Blobworld. My lab partners laughed at me repeatedly.

Results

Fancy method: (spatial variance + [4 * color neighborness]) / 5
Simple method: whole-image histogram

Filenames in italics are "accurate" results. Note: we think the results in the latter two columns may be wrong because data do not seem to agree with the results obtained in earlier tests (ie the first two columns). We think that we may have screwed up some code when adding comments between testing on picture 2 and 343, but it's too late to go back and re-run four tests now.

Matching pic.0002.ppm Matching pic.0343.ppm
Fancy Simple Fancy Simple

pic.0001.ppm
pic.0002.ppm
pic.0003.ppm
pic.0006.ppm
pic.0012.ppm

pic.0056.ppm
pic.0064.ppm
pic.0206.ppm
pic.0244.ppm
pic.0346.ppm
pic.0398.ppm
pic.0400.ppm
pic.0579.ppm
pic.0580.ppm
pic.0697.ppm
pic.0733.ppm

Accuracy: 5/5 = 100%
Precision: 5/16 = 31%

pic.0001.ppm
pic.0002.ppm
pic.0003.ppm
pic.0006.ppm

pic.0007.ppm
pic.0008.ppm
pic.0011.ppm
pic.0035.ppm
pic.0041.ppm
pic.0049.ppm
pic.0056.ppm
pic.0057.ppm
pic.0060.ppm
pic.0083.ppm
pic.0114.ppm
pic.0172.ppm
pic.0182.ppm
pic.0192.ppm
pic.0193.ppm
pic.0196.ppm
pic.0202.ppm
pic.0206.ppm
pic.0226.ppm
pic.0264.ppm
pic.0305.ppm
pic.0308.ppm
pic.0311.ppm
pic.0322.ppm
pic.0337.ppm
pic.0339.ppm
pic.0340.ppm
pic.0346.ppm
pic.0363.ppm
pic.0381.ppm
pic.0396.ppm
pic.0398.ppm
pic.0400.ppm
pic.0444.ppm
pic.0460.ppm
pic.0535.ppm
pic.0544.ppm
pic.0573.ppm
pic.0574.ppm
pic.0580.ppm
pic.0582.ppm
pic.0584.ppm
pic.0609.ppm
pic.0615.ppm
pic.0616.ppm
pic.0622.ppm
pic.0628.ppm
pic.0655.ppm
pic.0658.ppm
pic.0661.ppm
pic.0733.ppm
pic.0738.ppm
pic.0745.ppm
pic.0883.ppm
pic.0948.ppm

Accuracy: 4/5 = 80%
Precision: 4/59 = 7%

pic.0038.ppm
pic.0343.ppm
pic.0645.ppm

Accuracy: 1/7 = 14%
Precision: 1/3 = 33%

pic.0027.ppm
pic.0038.ppm
pic.0055.ppm
pic.0058.ppm
pic.0059.ppm
pic.0062.ppm
pic.0069.ppm
pic.0242.ppm
pic.0343.ppm
pic.0348.ppm
pic.0349.ppm
pic.0350.ppm
pic.0387.ppm
pic.0388.ppm
pic.0506.ppm
pic.0603.ppm
pic.0645.ppm
pic.0650.ppm

Accuracy: 2 / 7 = 28%
Precision: 2 / 18 = 11%