IRMar 26, 2017

Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned

arXiv:1703.08855v112 citations
AI Analysis

This provides practical insights for developers using Lucene in recommender systems, but it is incremental as it focuses on lessons learned from an existing application.

The authors used Apache Lucene as a content-based filtering recommender system for scholarly literature in Docear, finding that relevance scores below 0.025 led to lower click-through rates, random selection from top results reduced click-through rates by 15%, and fewer search results (under 1,000) halved click-through rates compared to more results.

For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0.025. Second, by picking ten recommendations randomly from Lucene's top50 search results, click-through rate decreased by 15%, compared to recommending the top10 results. Third, the number of returned search results tend to predict how high click-through rates will be: when Lucene returns less than 1,000 search results, click-through rates tend to be around half as high as if 1,000+ results are returned.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes