ROSep 27, 2025
UniPrototype: Humn-Robot Skill Learning with Uniform PrototypesXiao Hu, Qi Yin, Yangming Shi et al.
Data scarcity remains a fundamental challenge in robot learning. While human demonstrations benefit from abundant motion capture data and vast internet resources, robotic manipulation suffers from limited training examples. To bridge this gap between human and robot manipulation capabilities, we propose UniPrototype, a novel framework that enables effective knowledge transfer from human to robot domains via shared motion primitives. ur approach makes three key contributions: (1) We introduce a compositional prototype discovery mechanism with soft assignments, enabling multiple primitives to co-activate and thus capture blended and hierarchical skills; (2) We propose an adaptive prototype selection strategy that automatically adjusts the number of prototypes to match task complexity, ensuring scalable and efficient representation; (3) We demonstrate the effectiveness of our method through extensive experiments in both simulation environments and real-world robotic systems. Our results show that UniPrototype successfully transfers human manipulation knowledge to robots, significantly improving learning efficiency and task performance compared to existing approaches.The code and dataset will be released upon acceptance at an anonymous repository.
QUANT-PHJul 19, 2017
Quantum gate identification: error analysis, numerical results and optical experimentYuanlong Wang, Qi Yin, Daoyi Dong et al.
The identification of an unknown quantum gate is a significant issue in quantum technology. In this paper, we propose a quantum gate identification method within the framework of quantum process tomography. In this method, a series of pure states are inputted to the gate and then a fast state tomography on the output states is performed and the data are used to reconstruct the quantum gate. Our algorithm has computational complexity $O(d^3)$ with the system dimension $d$. The algorithm is compared with maximum likelihood estimation method for the running time, which shows the efficiency advantage of our method. An error upper bound is established for the identification algorithm and the robustness of the algorithm against the purity of input states is also tested. We perform quantum optical experiment on single-qubit Hadamard gate to verify the effectiveness of the identification algorithm.
CVNov 30, 2015
Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4Cong Yao, Jianan Wu, Xinyu Zhou et al.
Different from focused texts present in natural images, which are captured with user's intention and intervention, incidental texts usually exhibit much more diversity, variability and complexity, thus posing significant difficulties and challenges for scene text detection and recognition algorithms. The ICDAR 2015 Robust Reading Competition Challenge 4 was launched to assess the performance of existing scene text detection and recognition methods on incidental texts as well as to stimulate novel ideas and solutions. This report is dedicated to briefly introduce our strategies for this challenging problem and compare them with prior arts in this field.
CVJun 10, 2015
ICDAR 2015 Text Reading in the Wild CompetitionXinyu Zhou, Shuchang Zhou, Cong Yao et al.
Recently, text detection and recognition in natural scenes are becoming increasing popular in the computer vision community as well as the document analysis community. However, majority of the existing ideas, algorithms and systems are specifically designed for English. This technical report presents the final results of the ICDAR 2015 Text Reading in the Wild (TRW 2015) competition, which aims at establishing a benchmark for assessing detection and recognition algorithms devised for both Chinese and English scripts and providing a playground for researchers from the community. In this article, we describe in detail the dataset, tasks, evaluation protocols and participants of this competition, and report the performance of the participating methods. Moreover, promising directions for future research are discussed.
CVJan 20, 2015
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?Erjin Zhou, Zhimin Cao, Qi Yin
Face recognition performance improves rapidly with the recent deep learning technique developing and underlying large training dataset accumulating. In this paper, we report our observations on how big data impacts the recognition performance. According to these observations, we build our Megvii Face Recognition System, which achieves 99.50% accuracy on the LFW benchmark, outperforming the previous state-of-the-art. Furthermore, we report the performance in a real-world security certification scenario. There still exists a clear gap between machine recognition and human performance. We summarize our experiments and present three challenges lying ahead in recent face recognition. And we indicate several possible solutions towards these challenges. We hope our work will stimulate the community's discussion of the difference between research benchmark and real-world applications.
CVMar 12, 2014
Learning Deep Face RepresentationHaoqiang Fan, Zhimin Cao, Yuning Jiang et al.
Face representation is a crucial step of face recognition systems. An optimal face representation should be discriminative, robust, compact, and very easy-to-implement. While numerous hand-crafted and learning-based representations have been proposed, considerable room for improvement is still present. In this paper, we present a very easy-to-implement deep learning framework for face representation. Our method bases on a new structure of deep network (called Pyramid CNN). The proposed Pyramid CNN adopts a greedy-filter-and-down-sample operation, which enables the training procedure to be very fast and computation-efficient. In addition, the structure of Pyramid CNN can naturally incorporate feature sharing across multi-scale face representations, increasing the discriminative ability of resulting representation. Our basic network is capable of achieving high recognition accuracy ($85.8\%$ on LFW benchmark) with only 8 dimension representation. When extended to feature-sharing Pyramid CNN, our system achieves the state-of-the-art performance ($97.3\%$) on LFW benchmark. We also introduce a new benchmark of realistic face images on social network and validate our proposed representation has a good ability of generalization.