DeepNote: Note-Centric Deep Retrieval-Augmented GenerationRuobing Wang, Qingfei Zhao, Yukun Yan et al.
Retrieval-Augmented Generation (RAG) mitigates factual errors and hallucinations in Large Language Models (LLMs) for question-answering (QA) by incorporating external knowledge. However, existing adaptive RAG methods rely on LLMs to predict retrieval timing and directly use retrieved information for generation, often failing to reflect real information needs and fully leverage retrieved knowledge. We develop DeepNote, an adaptive RAG framework that achieves in-depth and robust exploration of knowledge sources through note-centric adaptive retrieval. DeepNote employs notes as carriers for refining and accumulating knowledge. During in-depth exploration, it uses these notes to determine retrieval timing, formulate retrieval queries, and iteratively assess knowledge growth, ultimately leveraging the best note for answer generation. Extensive experiments and analyses demonstrate that DeepNote significantly outperforms all baselines (+10.2% to +20.1%) and exhibits the ability to gather knowledge with both high density and quality. Additionally, DPO further improves the performance of DeepNote. The code and data are available at https://github.com/thunlp/DeepNote.
28.4ROOct 31, 2024
3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile SensingBinghao Huang, Yixuan Wang, Xinyi Yang et al.
Tactile and visual perception are both crucial for humans to perform fine-grained interactions with their environment. Developing similar multi-modal sensing capabilities for robots can significantly enhance and expand their manipulation skills. This paper introduces \textbf{3D-ViTac}, a multi-modal sensing and learning system designed for dexterous bimanual manipulation. Our system features tactile sensors equipped with dense sensing units, each covering an area of 3$mm^2$. These sensors are low-cost and flexible, providing detailed and extensive coverage of physical contacts, effectively complementing visual information. To integrate tactile and visual data, we fuse them into a unified 3D representation space that preserves their 3D structures and spatial relationships. The multi-modal representation can then be coupled with diffusion policies for imitation learning. Through concrete hardware experiments, we demonstrate that even low-cost robots can perform precise manipulations and significantly outperform vision-only policies, particularly in safe interactions with fragile items and executing long-horizon tasks involving in-hand manipulation. Our project page is available at \url{https://binghao-huang.github.io/3D-ViTac/}.
1.8LGDec 18, 2019
Learning to Prevent Leakage: Privacy-Preserving Inference in the Mobile CloudShuang Zhang, Liyao Xiang, Congcong Li et al.
Powered by machine learning services in the cloud, numerous learning-driven mobile applications are gaining popularity in the market. As deep learning tasks are mostly computation-intensive, it has become a trend to process raw data on devices and send the deep neural network (DNN) features to the cloud, where the features are further processed to return final results. However, there is always unexpected leakage with the release of features, with which an adversary could infer a significant amount of information about the original data. We propose a privacy-preserving reinforcement learning framework on top of the mobile cloud infrastructure from the perspective of DNN structures. The framework aims to learn a policy to modify the base DNNs to prevent information leakage while maintaining high inference accuracy. The policy can also be readily transferred to large-size DNNs to speed up learning. Extensive evaluations on a variety of DNNs have shown that our framework can successfully find privacy-preserving DNN structures to defend different privacy attacks.