Yongrui Qin

7.2CVMay 19

P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

Kai Sheng, Liuyi Wang, Haojie Dai et al.

Vision-and-language navigation (VLN) requires an embodied agent to ground natural-language instructions into executable navigation actions in unseen environments. Existing zero-shot methods typically rely on additional waypoint prediction modules, which often entangle high-level directional reasoning with fine-grained local grounding, leading to error-prone and unstable decisions. In this paper, we propose P2DNav, a hierarchical framework for zero-shot vision-and-language navigation. P2DNav consists of three core components: Panorama-to-Downview (P2D), Sliding-Window Dialogue Memory (SDM), and Reflective Reorientation Mechanism (RRM). P2D explicitly decomposes navigation decision-making into two stages: panoramic direction selection and downview local grounding. It first selects the instruction-relevant direction from a 360° panorama, and then predicts a pixel-level target point from the downview RGB observation in that direction. In addition, SDM organizes navigation history as a multi-turn dialogue context and maintains recent visual observations within a sliding window to support long-horizon navigation. RRM further enables reflective reorientation by assessing the reliability of local grounding based on the downview observation and returning to panoramic direction selection when necessary. Experiments on the R2R-CE benchmark show that P2DNav achieves strong performance among zero-shot methods. In particular, compared with the state-of-the-art (SOTA) zero-shot waypoint-based and waypoint-free methods, P2DNav achieves SR gains of 146.6% and 58.9%, respectively, demonstrating the effectiveness of P2D, SDM, and RRM for zero-shot VLN. Code will be released for public use.

2.7IRJul 23, 2016

Searching for the Internet of Things on the Web: Where It Is and What It Looks Like

Ali Shemshadi, Quan Z. Sheng, Wei Emma Zhang et al.

The Internet of Things (IoT), in general, is a compelling paradigm that aims to connect everyday objects to the Internet. Nowadays, IoT is considered as one of the main technologies which contribute towards reshaping our daily lives in the next decade. IoT unlocks many exciting new opportunities in a variety of applications in research and industry domains. However, many have complained about the absence of the real-world IoT data. Unsurprisingly, a common question that arises regularly nowadays is "Does the IoT already exist?". So far, little has been known about the real-world situation on IoT, its attributes, the presentation of data and user interests. To answer this question, in this work, we conduct an in-depth analytical investigation on real IoT data. More specifically, we identify IoT data sources over the Web and develop a crawler engine to collect large-scale real-world IoT data for the first time. We make the results of our work available to the public in order to assist the community in the future research. In particular, we collect the data of nearly two million Internet connected objects and study trends in IoT using a real-world query set from an IoT search engine. Based on the collected data and our analysis, we identify the typical characteristics of IoT data. The most intriguing finding of our study is that IoT data is mainly disseminated using Web Mapping while the emerging IoT solutions such as the Web of Things, are currently not well adopted. On top of our findings, we further discuss future challenges and open research problems in the IoT area.

Yongrui Qin

2 Papers