92.8ROMay 28
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided RepresentationJusuk Lee, Seungjae Lee, Jonghun Shin et al.
Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual encoders pre-trained for static recognition or vision-language alignment, leaving motion understanding to downstream policies. We introduce DynaFLIP, a dynamics-aware multimodal pre-training framework that pushes motion understanding upstream into perception. We construct image-language-3D flow triplets from heterogeneous human and robot videos, and use these triplets as training-time supervision to shape an image-only encoder. Our key idea is to encourage the three modalities to span a small simplex volume in the shared hyperspherical space -- a smaller simplex volume indicating stronger alignment. To avoid the geometric ambiguity and trivial collapse of naive volume minimization, we combine simplex-volume minimization with a cosine regularizer and a contrastive objective. Our analyses show that DynaFLIP focuses on control-relevant regions critical for manipulation. The resulting dynamics-aware representations serve as reusable visual backbones and consistently outperform baselines across diverse downstream policies, including VLAs. We validate this across diverse simulation and real-world setups, with gains reaching +22.5% under out-of-distribution scenarios. Our results suggest that robot generalization improves when visual representations are trained to encode not just what is present, but how the world changes under action.
HCMay 8, 2020
Choose Your Own Question: Encouraging Self-Personalization in Learning Path ConstructionYoungduck Choi, Yoonho Na, Youngjik Yoon et al.
Learning Path Recommendation is the heart of adaptive learning, the educational paradigm of an Interactive Educational System (IES) providing a personalized learning experience based on the student's history of learning activities. In typical existing IESs, the student must fully consume a recommended learning item to be provided a new recommendation. This workflow comes with several limitations. For example, there is no opportunity for the student to give feedback on the choice of learning items made by the IES. Furthermore, the mechanism by which the choice is made is opaque to the student, limiting the student's ability to track their learning. To this end, we introduce Rocket, a Tinder-like User Interface for a general class of IESs. Rocket provides a visual representation of Artificial Intelligence (AI)-extracted features of learning materials, allowing the student to quickly decide whether the material meets their needs. The student can choose between engaging with the material and receiving a new recommendation by swiping or tapping. Rocket offers the following potential improvements for IES User Interfaces: First, Rocket enhances the explainability of IES recommendations by showing students a visual summary of the meaningful AI-extracted features used in the decision-making process. Second, Rocket enables self-personalization of the learning experience by leveraging the students' knowledge of their own abilities and needs. Finally, Rocket provides students with fine-grained information on their learning path, giving them an avenue to assess their own skills and track their learning progress. We present the source code of Rocket, in which we emphasize the independence and extensibility of each component, and make it publicly available for all purposes.
LGJan 1, 2020
Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational SystemsYoungduck Choi, Youngnam Lee, Junghyun Cho et al.
Like many other domains in Artificial Intelligence (AI), there are specific tasks in the field of AI in Education (AIEd) for which labels are scarce and expensive, such as predicting exam score or review correctness. A common way of circumventing label-scarce problems is pre-training a model to learn representations of the contents of learning items. However, such methods fail to utilize the full range of student interaction data available and do not model student learning behavior. To this end, we propose Assessment Modeling, a class of fundamental pre-training tasks for general interactive educational systems. An assessment is a feature of student-system interactions which can serve as a pedagogical evaluation. Examples include the correctness and timeliness of a student's answer. Assessment Modeling is the prediction of assessments conditioned on the surrounding context of interactions. Although it is natural to pre-train on interactive features available in large amounts, limiting the prediction targets to assessments focuses the tasks' relevance to the label-scarce educational problems and reduces less-relevant noise. While the effectiveness of different combinations of assessments is open for exploration, we suggest Assessment Modeling as a first-order guiding principle for selecting proper pre-training tasks for label-scarce educational problems.