Evaluating Memento Service Optimizations
This work addresses performance issues for users of web archive services, but it is incremental as it builds on previous optimizations with long-term evaluation.
The paper tackled slow response times in Memento Aggregator services by implementing a cache and machine learning models for archival holdings prediction, resulting in a 70-80% cache hit rate for human-driven services and an average recall of 0.727 for predictions.
Services and applications based on the Memento Aggregator can suffer from slow response times due to the federated search across web archives performed by the Memento infrastructure. In an effort to decrease the response times, we established a cache system and experimented with machine learning models to predict archival holdings. We reported on the experimental results in previous work and can now, after these optimizations have been in production for two years, evaluate their efficiency, based on long-term log data. During our investigation we find that the cache is very effective with a 70-80% cache hit rate for human-driven services. The machine learning prediction operates at an acceptable average recall level of 0.727 but our results also show that a more frequent retraining of the models is needed to further improve prediction accuracy.