Language Processing for Image Retrieval

Thijs Westerveld (University of Twente)

This paper describes how techniques from Natural Language Processing can
be used for Image Retrieval. Most WWW search engines nowadays allow you
to search for non-textual data by checking a 'must include image'-box.
Our approach goes one step further, in addition to using NLP techniques
to index collateral text, we also use these techniques to index the
image content itself.

In text retrieval, words from the text are the obvious indexing terms and document similarity is often computed by counting the number of corresponding words in two documents. But, what are the terms in Images and how can image similarity be computed? In most image retrieval systems, low-level image-features like colour-histograms, textures and edges are used as indexing terms. Image similarity is for example defined by the distance between two images in a certain colour-space. But, are we really interested in these low-level features? Do we really want an image retrieval system to find images with the same number of red pixels as our query image?

The more interesting and more challenging type of information to search for is semantic content. Both in text retrieval and in image retrieval we are interested in the content of a document, rather than in the words or image-features to describe this content. We use latent semantic indexing (LSI) to combine words and image-features into one conceptual space, thus capturing the underlying semantics of both text and images.