Evaluating the retrieval effectiveness of Web search engines using a representative query sample
This research addresses the problem of evaluating search engine performance for users and researchers by providing a more robust methodology, though it is incremental as it builds on existing evaluation practices.
The study tackled the issue of small-scale search engine retrieval effectiveness studies by using a representative sample of 2,000 queries to compare Google and Bing, finding that Google outperformed Bing, with a notable difference in navigational queries (95.3% vs. 76.6% correct answers).
Search engine retrieval effectiveness studies are usually small-scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,000 navigational queries from a major German search engine and comparing Google's and Bing's results based on this sample. Jurors were found through crowdsourcing, data was collected using specialised software, the Relevance Assessment Tool (RAT). We found that while Google outperforms Bing in both query types, the difference in the performance for informational queries was rather low. However, for navigational queries, Google found the correct answer in 95.3 per cent of cases whereas Bing only found the correct answer 76.6 per cent of the time. We conclude that search engine performance on navigational queries is of great importance, as users in this case can clearly identify queries that have returned correct results. So, performance on this query type may contribute to explaining user satisfaction with search engines.