CVMar 16Code
Towards Generalizable Robotic Manipulation in Dynamic EnvironmentsHeng Fang, Shangru Li, Shuhan Wang et al.
Vision-Language-Action (VLA) models excel in static manipulation but struggle in dynamic environments with moving targets. This performance gap primarily stems from a scarcity of dynamic manipulation datasets and the reliance of mainstream VLAs on single-frame observations, restricting their spatiotemporal reasoning capabilities. To address this, we introduce DOMINO, a large-scale dataset and benchmark for generalizable dynamic manipulation, featuring 35 tasks with hierarchical complexities, over 110K expert trajectories, and a multi-dimensional evaluation suite. Through comprehensive experiments, we systematically evaluate existing VLAs on dynamic tasks, explore effective training strategies for dynamic awareness, and validate the generalizability of dynamic data. Furthermore, we propose PUMA, a dynamics-aware VLA architecture. By integrating scene-centric historical optical flow and specialized world queries to implicitly forecast object-centric future states, PUMA couples history-aware perception with short-horizon prediction. Results demonstrate that PUMA achieves state-of-the-art performance, yielding a 6.3% absolute improvement in success rate over baselines. Moreover, we show that training on dynamic data fosters robust spatiotemporal representations that transfer to static tasks. All code and data are available at https://github.com/H-EmbodVis/DOMINO.
IRMay 26, 2019
Adaptive Learning Material Recommendation in Online Language EducationShuhan Wang, Hao Wu, Ji Hun Kim et al.
Recommending personalized learning materials for online language learning is challenging because we typically lack data about the student's ability and the relative difficulty of learning materials. This makes it hard to recommend appropriate content that matches the student's prior knowledge. In this paper, we propose a refined hierarchical knowledge structure to model vocabulary knowledge, which enables us to automatically organize the authentic and up-to-date learning materials collected from the internet. Based on this knowledge structure, we then introduce a hybrid approach to recommend learning materials that adapts to a student's language level. We evaluate our work with an online Japanese learning tool and the results suggest adding adaptivity into material recommendation significantly increases student engagement.
CLSep 16, 2016
Grammatical Templates: Improving Text Difficulty Evaluation for Language LearnersShuhan Wang, Erik Andersen
Language students are most engaged while reading texts at an appropriate difficulty level. However, existing methods of evaluating text difficulty focus mainly on vocabulary and do not prioritize grammatical features, hence they do not work well for language learners with limited knowledge of grammar. In this paper, we introduce grammatical templates, the expert-identified units of grammar that students learn from class, as an important feature of text difficulty evaluation. Experimental classification results show that grammatical template features significantly improve text difficulty prediction accuracy over baseline readability features by 7.4%. Moreover, we build a simple and human-understandable text difficulty evaluation approach with 87.7% accuracy, using only 5 grammatical template features.