Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning

Bernardo A. Denkvitts, Nitin Gupta, Biplav Srivastava

arXiv:2605.2761036.4h-index: 4

Predicted impact top 88% in IR · last 90 daysOriginality Incremental advance

AI Analysis

For researchers tracking rapidly evolving technical areas, Eliot provides an auditable, query-time clustering and temporal visualization tool that complements existing search and LLM-based assistants.

Eliot is an interactive system that retrieves and clusters arXiv papers at query time to help researchers explore fast-changing scientific literature, with a configuration study across eight domains recommending MiniLM embeddings, 10D UMAP, and Agglomerative Clustering, and a user study finding cluster labels meaningful in 85% of scenarios.

The rapid growth of scientific publishing has made it increasingly difficult to track how fast-moving areas evolve. Search engines and LLM-based assistants retrieve or summarize papers, but often hide how the corpus was selected, organized, or connected to temporal patterns. We present $\texttt{Eliot}$, a publicly deployed interactive system for traceable exploration of evolving scientific literature. Motivated by two studies on Large Language Models (LLMs) and Automated Planning and Scheduling (APS), $\texttt{Eliot}$ generalizes literature-evolution analysis beyond hand-built taxonomies and domain-specific scripts. Given explicit query terms and filters, it retrieves arXiv papers at query time, represents each paper by title and abstract, clusters the corpus into themes, assigns representative keywords, and visualizes each cluster's publication-year distribution. We evaluate $\texttt{Eliot}$ as both an applied system and an interactive research aid. An offline configuration study across eight arXiv domains compares document representations, dimensionality reduction methods, and clustering algorithms using intrinsic clustering and topic-coherence metrics; the results support MiniLM embeddings with 10-dimensional UMAP and Agglomerative Clustering as a practical default. A scenario-based survey and expert focus group assess interpretability and use contexts: participants rated cluster labels as meaningful in 85% of scenario responses, and feedback indicated that $\texttt{Eliot}$ is most valuable for auditable overviews of rapidly changing technical areas. These results suggest that query-time clustering and temporal inspection can complement search and generation tools by helping researchers inspect and refine the evidence behind literature trends.

View on arXiv PDF

Similar