IRDec 24, 2020

A Frequency-Based Learning-To-Rank Approach for Personal Digital Traces

arXiv:2012.13114v11.6

Originality Incremental advance

AI Analysis

This work provides an incremental improvement in search accuracy for individuals trying to find their own heterogeneous digital data.

This paper addresses the challenge of searching small, heterogeneous personal digital traces by proposing a learning-to-rank approach using LambdaMART and frequency-based features. The method improves search accuracy compared to traditional search tools on both a public email collection and a real user's personal digital trace collection.

Personal digital traces are constantly produced by connected devices, internet services and interactions. These digital traces are typically small, heterogeneous and stored in various locations in the cloud or on local devices, making it a challenge for users to interact with and search their own data. By adopting a multidimensional data model based on the six natural questions -- what, when, where, who, why and how -- to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results. Due to the lack of publicly available personal training data, a combination of known-item query generation techniques and an unsupervised ranking model (field-based BM25) is used to build our own training sets. Experiments performed over a publicly available email collection and a personal digital data trace collection from a real user show that the frequency-based learning approach improves search accuracy when compared with traditional search tools.

View on arXiv PDF

Similar