CVNov 19, 2023Code
Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object DetectionLv Tang, Peng-Tao Jiang, Zhihao Shen et al.
In this paper, we introduce a novel multimodal camo-perceptive framework (MMCPF) aimed at handling zero-shot Camouflaged Object Detection (COD) by leveraging the powerful capabilities of Multimodal Large Language Models (MLLMs). Recognizing the inherent limitations of current COD methodologies, which predominantly rely on supervised learning models demanding extensive and accurately annotated datasets, resulting in weak generalization, our research proposes a zero-shot MMCPF that circumvents these challenges. Although MLLMs hold significant potential for broad applications, their effectiveness in COD is hindered and they would make misinterpretations of camouflaged objects. To address this challenge, we further propose a strategic enhancement called the Chain of Visual Perception (CoVP), which significantly improves the perceptual capabilities of MLLMs in camouflaged scenes by leveraging both linguistic and visual cues more effectively. We validate the effectiveness of MMCPF on five widely used COD datasets, containing CAMO, COD10K, NC4K, MoCA-Mask and OVCamo. Experiments show that MMCPF can outperform all existing state-of-the-art zero-shot COD methods, and achieve competitive performance compared to weakly-supervised and fully-supervised methods, which demonstrates the potential of MMCPF. The Github link of this paper is \url{https://github.com/luckybird1994/MMCPF}.
SPJul 20, 2019
Retrieving Similar Trajectories from Cellular Data at City ScaleZhihao Shen, Wan Du, Xi Zhao et al.
Retrieving similar trajectories from a large trajectory dataset is important for a variety of applications, like transportation planning and mobility analysis. Unlike previous works based on fine-grained GPS trajectories, this paper investigates the feasibility of identifying similar trajectories from cellular data observed by mobile infrastructure, which provide more comprehensive coverage. To handle the large localization errors and low sample rates of cellular data, we develop a holistic system, cellSim, which seamlessly integrates map matching and similar trajectory search. A set of map matching techniques are proposed to transform cell tower sequences into moving trajectories on a road map by considering the unique features of cellular data, like the dynamic density of cell towers and bidirectional roads. To further improve the accuracy of similarity search, map matching outputs M trajectory candidates of different confidence, and a new similarity measure scheme is developed to process the map matching results. Meanwhile, M is dynamically adapted to maintain a low false positive rate of the similarity search, and two pruning schemes are proposed to minimize the computation overhead. Extensive experiments on a large-scale dataset and real-world trajectories of 1701 km reveal that cellSim provides high accuracy (precision 62.4% and recall of 89.8%).