Conventional information retrieval is based solely on text, and the approaches to textual information retrieval have been transplanted into image retrieval in a variety of ways, including the representation of an image as a vector of feature values of different modalities. It has been widely recognized that the image retrieval techniques should become an integration of different modalities, such as color, texture and associated text keywords. To take the cue from text-based retrieval techniques, we construct “visual keywords” using vector quantization of small sized image tiles. Both visual and text keywords are combined and used to represent an image as a single multimodal vector. We demonstrate the power of these multimodal image keywords for clustering and retrieval of relevant images from a large collection.