CVJan 30
Triage: Hierarchical Visual Budgeting for Efficient Video Reasoning in Vision-Language ModelsAnmin Wang, Nan Zhang, Wei Tao et al.
Vision-Language Models (VLMs) face significant computational challenges in video processing due to massive data redundancy, which creates prohibitively long token sequences. To address this, we introduce Triage, a training-free, plug-and-play framework that reframes video reasoning as a resource allocation problem via hierarchical visual budgeting. Its first stage, Frame-Level Budgeting, identifies keyframes by evaluating their visual dynamics and relevance, generating a strategic prior based on their importance scores. Guided by this prior, the second stage, Token-Level Budgeting, allocates tokens in two phases: it first secures high-relevance Core Tokens, followed by diverse Context Tokens selected with an efficient batched Maximal Marginal Relevance (MMR) algorithm. Extensive experiments demonstrate that Triage improves inference speed and reduces memory footprint, while maintaining or surpassing the performance of baselines and other methods on various video reasoning benchmarks.
CVJun 5, 2025
Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual LearningZiqi Jia, Anmin Wang, Xiaoyang Qu et al.
Previous continual learning setups for embodied intelligence focused on executing low-level actions based on human commands, neglecting the ability to learn high-level planning and multi-level knowledge. To address these issues, we propose the Hierarchical Embodied Continual Learning Setups (HEC) that divide the agent's continual learning process into two layers: high-level instructions and low-level actions, and define five embodied continual learning sub-setups. Building on these setups, we introduce the Task-aware Mixture of Incremental LoRA Experts (Task-aware MoILE) method. This approach achieves task recognition by clustering visual-text embeddings and uses both a task-level router and a token-level router to select the appropriate LoRA experts. To effectively address the issue of catastrophic forgetting, we apply Singular Value Decomposition (SVD) to the LoRA parameters obtained from prior tasks, preserving key components while orthogonally training the remaining parts. The experimental results show that our method stands out in reducing the forgetting of old tasks compared to other methods, effectively supporting agents in retaining prior knowledge while continuously learning new tasks.
QUANT-PHAug 14, 2017
Quantum estimation of detection efficiency with no-knowledge quantum feedbackDong Xie, Chunling Xu, Jianyong Chen et al.
We investigate that no-knowledge measurement-based feedback control is utilized to obtain the estimation precision of the detection efficiency. For the feedback operators that concern us, no-knowledge measurement is the optimal way to estimate the detection efficiency. We show that the higher precision can be achieved for the lower or larger detection efficiency. It is found that no-knowledge feedback can be used to cancel decoherence. No-knowledge feedback with a high detection efficiency can perform well in estimating frequency and detection efficiency parameters simultaneously. And simultaneous estimation is better than independent estimation given by the same probes.