FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents
This work addresses challenges in scientific document representation for researchers and practitioners, offering a more integrated approach, though it appears incremental as it builds on existing methods.
The paper tackled the problem of scientific document representation learning by proposing FLeW, a method that unifies contrastive training, fine-grained representation, and task-aware learning to generate better embeddings, resulting in improved applicability and robustness across multiple scientific tasks and fields compared to prior models.
Scientific document representation learning provides powerful embeddings for various tasks, while current methods face challenges across three approaches. 1) Contrastive training with citation-structural signals underutilizes citation information and still generates single-vector representations. 2) Fine-grained representation learning, which generates multiple vectors at the sentence or aspect level, requires costly integration and lacks domain generalization. 3) Task-aware learning depends on manually predefined task categorization, overlooking nuanced task distinctions and requiring extra training data for task-specific modules. To address these problems, we propose a new method that unifies the three approaches for better representations, namely FLeW. Specifically, we introduce a novel triplet sampling method that leverages citation intent and frequency to enhance citation-structural signals for training. Citation intents (background, method, result), aligned with the general structure of scientific writing, facilitate a domain-generalized facet partition for fine-grained representation learning. Then, we adopt a simple weight search to adaptively integrate three facet-level embeddings into a task-specific document embedding without task-aware fine-tuning. Experiments show the applicability and robustness of FLeW across multiple scientific tasks and fields, compared to prior models.