HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model
This work improves memory retrieval for language agents by better modeling negative sample difficulty, though it is incremental as it focuses on data construction rather than a new paradigm.
The paper tackles the problem of embedding models for memory retrieval in language agents by addressing the lack of hierarchical difficulty and natural distribution in negative samples, resulting in significant performance gains such as F1/BLEU-1 improvements of up to 3.27%/3.30% on benchmarks.
Memory-augmented language agents rely on embedding models for effective memory retrieval. However, existing training data construction overlooks a critical limitation: the hierarchical difficulty of negative samples and their natural distribution in human-agent interactions. In practice, some negatives are semantically close distractors while others are trivially irrelevant, and natural dialogue exhibits structured proportions of these types. Current approaches using synthetic or uniformly sampled negatives fail to reflect this diversity, limiting embedding models' ability to learn nuanced discrimination essential for robust memory retrieval. In this work, we propose a principled data construction framework HiNS that explicitly models negative sample difficulty tiers and incorporates empirically grounded negative ratios derived from conversational data, enabling the training of embedding models with substantially improved retrieval fidelity and generalization in memory-intensive tasks. Experiments show significant improvements: on LoCoMo, F1/BLEU-1 gains of 3.27%/3.30%(MemoryOS) and 1.95%/1.78% (Mem0); on PERSONAMEM, total score improvements of 1.19% (MemoryOS) and 2.55% (Mem0).