CLNov 15, 2023Code
MELA: Multilingual Evaluation of Linguistic AcceptabilityZiyin Zhang, Yikang Liu, Weifang Huang et al.
In this work, we present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability -- MELA, with 46K samples covering 10 languages from a diverse set of language families. We establish LLM baselines on this benchmark, and investigate cross-lingual transfer in acceptability judgements with XLM-R. In pursuit of multilingual interpretability, we conduct probing experiments with fine-tuned XLM-R to explore the process of syntax capability acquisition. Our results show that GPT-4o exhibits a strong multilingual ability, outperforming fine-tuned XLM-R, while open-source multilingual models lag behind by a noticeable gap. Cross-lingual transfer experiments show that transfer in acceptability judgment is non-trivial: 500 Icelandic fine-tuning examples lead to 23 MCC performance in a completely unrelated language -- Chinese. Results of our probing experiments indicate that training on MELA improves the performance of XLM-R on syntax-related tasks. Our data is available at https://github.com/sjtu-compling/MELA.
12.9ROJun 4
SCOUT: Semantic scene COverage via Uncertainty-guided TraversalJunyu Mao, Sara Ayoubi, Vishnu D. Sharma et al.
Robots that operate over extended periods should not merely visit space; they should progressively understand it. Yet most 3D scene graph pipelines treat perception as a post-processing stage over a fixed dataset, decoupling scene representation from the decisions that determine what is observed in the first place. We present SCOUT, an online semantic exploration framework that closes this loop by coupling active traversal with probabilistic scene graph construction. Given a prior 2D occupancy map and posed RGB-D observations, SCOUT incrementally builds an uncertainty-aware 3D scene graph whose nodes maintain fused geometry and posterior beliefs over open-vocabulary object labels, while edges encode structural relations such as on, inside, belong, and next to. These beliefs are fed back to an uncertainty-guided traversal planner, which selects viewpoints by balancing expected semantic certainty gain, geometric coverage gain, and travel cost. In this way, the robot revisits ambiguous objects when additional evidence matters and expands into unseen free space when the scene remains incomplete. The resulting system treats semantic scene completeness as an operational objective rather than a passive by-product of semantic mapping, moving toward autonomous agents that can patrol, update, and reason about evolving indoor environments with minimal human intervention.
CLMay 23, 2023
Do prompt positions really matter?Junyu Mao, Stuart E. Middleton, Mahesan Niranjan
Prompt-based models have gathered a lot of attention from researchers due to their remarkable advancements in the fields of zero-shot and few-shot learning. Developing an effective prompt template plays a critical role. However, prior studies have mainly focused on prompt vocabulary searching or embedding initialization within a predefined template with the prompt position fixed. In this empirical study, we conduct the most comprehensive analysis to date of prompt position for diverse Natural Language Processing (NLP) tasks. Our findings quantify the substantial impact prompt position has on model performance. We observe that the prompt positions used in prior studies are often sub-optimal, and this observation is consistent even in widely used instruction-tuned models. These findings suggest prompt position optimisation as a valuable research direction to augment prompt engineering methodologies and prompt position-aware instruction tuning as a potential way to build more robust models in the future.