Web Scale Image Retrieval based on Image Text Query Pair and Click Data
Sanjeev, Poudel Chhetri
Permanent address of the item is
The growing importance of traditional text-based image retrieval is due to its popularity through web image search engines. Google, Yahoo, Bing etc. are some of search engines that use this technique. Text-based image retrieval is based on the assumption that surrounding text describes the image. For text-based image retrieval systems, input is a text query and output is a ranking set of images in which most relevant results appear first. The limitation of text-based image retrieval is that most of the times query text is not able to describe the content of the image perfectly since visual information is full of variety. Microsoft Research Bing Image retrieval Challenge aims to achieve cross-modal retrieval by ranking the relevance of the query text terms and the images. This thesis addresses the approaches of our team MUVIS for Microsoft research Bing image retrieval challenge to measure the relevance of web images and the query given in text form. This challenge is to develop an image-query pair scoring system to assess the effectiveness of query terms in describing the images. The provided dataset included a training set containing more than 23 million clicked image-query pairs collected from the web (One year). Also, a development set was collected which had been manually labelled. On each image-query pair, a floating-point score was produced. The floating-point score reflected the relevancy of the query to describe the given image, with higher number including higher relevance and vice versa. Sorting its corresponding score for all its associated images produced the retrieval ranking for the images of any query. The system developed by MUVIS team consisted of five modules. Two main modules were text processing module and principal component analysis assisted perceptron regression with random sub-space selection. To enhance evaluation accuracy, three complementary modules i.e. face bank, duplicate image detector and optical character recognition were also developed. Both main module and complementary modules relied on results returned by text processing module. OverFeat features extracted over text processing module results acted as input for principal component analysis assisted perceptron regression with random sub-space selection module which further transformed the features vector. The relevance score for each query-image pair was achieved by comparing the feature of the query image and the relevant training images. For features extraction, used in the face bank and duplicate image detector modules, we used CMUVIS framework. CMUVIS framework is a distributed computing framework for big data developed by the MUVIS group. Three runs were submitted for evaluation: “Master”, “Sub2”, and “Sub3”. The cumulative similarity was returned as the requested images relevance. Using the proposed approach we reached the value of 0.5099 in terms of discounted cumulative gain on the development set. On the test set we gained 0.5116. Our solution achieved fourth place in Microsoft Research Bing grand challenge 2014 for master submission and second place for overall submission.