Tagging Products using Image Classification

Brian Tomasik, Phyo Thiha, and Douglas Turnbull

bag of features


Associating labels with online products can be a labor-intensive task. We study the extent to which a standard 'bag of visual words' image classifier can be used to tag products with useful information, such as whether a sneaker has laces or velcro straps. Using Scale Invariant Feature Transform (SIFT) image descriptors at random keypoints, a hierarchical visual vocabulary, and a variant of nearest-neighbor classification, we achieve accuracies between 66% and 98% on 2- and 3-class classification tasks using several dozen training examples. We also increase accuracy by combining information from multiple views of the same product.


Two-page SIGIR version (pdf, ps); 8-page technical report (pdf)


Feel free to contact any of the authors to obtain a copy of our data set of ~3,500 images.