Xiaoyu Ren

CV
h-index49
6papers
18citations
Novelty48%
AI Score45

6 Papers

CVMar 8, 2023Code
FCN+: Global Receptive Convolution Makes FCN Great Again

Xiaoyu Ren, Zhongying Deng, Jin Ye et al.

Fully convolutional network (FCN) is a seminal work for semantic segmentation. However, due to its limited receptive field, FCN cannot effectively capture global context information which is vital for semantic segmentation. As a result, it is beaten by state-of-the-art methods that leverage different filter sizes for larger receptive fields. However, such a strategy usually introduces more parameters and increases the computational cost. In this paper, we propose a novel global receptive convolution (GRC) to effectively increase the receptive field of FCN for context information extraction, which results in an improved FCN termed FCN+. The GRC provides the global receptive field for convolution without introducing any extra learnable parameters. The motivation of GRC is that different channels of a convolutional filter can have different grid sampling locations across the whole input feature map. Specifically, the GRC first divides the channels of the filter into two groups. The grid sampling locations of the first group are shifted to different spatial coordinates across the whole feature map, according to their channel indexes. This can help the convolutional filter capture the global context information. The grid sampling location of the second group remains unchanged to keep the original location information. By convolving using these two groups, the GRC can integrate the global context into the original location information of each pixel for better dense prediction results. With the GRC built in, FCN+ can achieve comparable performance to state-of-the-art methods for semantic segmentation tasks, as verified on PASCAL VOC 2012, Cityscapes, and ADE20K. Our code will be released at https://github.com/Zhongying-Deng/FCN_Plus.

72.9CVMar 29
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Zhongying Deng, Cheng Tang, Ziyan Huang et al. · pku

Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation models. In this work, we present the largest survey to date of medical image datasets, covering over 1,000 open-access datasets with a systematic catalog of their modalities, tasks, anatomies, annotations, limitations, and potential for integration. Our analysis exposes a landscape that is modest in scale, fragmented across narrowly scoped tasks, and unevenly distributed across organs and modalities, which in turn limits the utility of existing medical image datasets for developing versatile and robust medical foundation models. To turn fragmentation into scale, we propose a metadata-driven fusion paradigm (MDFP) that integrates public datasets with shared modalities or tasks, thereby transforming multiple small data silos into larger, more coherent resources. Building on MDFP, we release an interactive discovery portal that enables end-to-end, automated medical image dataset integration, and compile all surveyed datasets into a unified, structured table that clearly summarizes their key characteristics and provides reference links, offering the community an accessible and comprehensive repository. By charting the current terrain and offering a principled path to dataset consolidation, our survey provides a practical roadmap for scaling medical imaging corpora, supporting faster data discovery, more principled dataset creation, and more capable medical foundation models.

64.4ROApr 27
Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning

Chenhao Liu, Leyun Jiang, Yibo Wang et al.

Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion and manipulation, yet dynamic real-world interactions remain challenging. As a step toward fast-moving object interactions, we present a reinforcement-learning training pipeline that yields a unified whole-body controller for humanoid badminton, coordinating footwork and striking without motion priors or expert demonstrations. Training follows a three-stage curriculum (footwork acquisition, precision-guided swing generation, and task-focused refinement) so legs and arms jointly serve the hitting objective. For deployment, we use an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking, and also develop a prediction-free variant that removes the EKF and explicit prediction. We validate the framework with five sets of experiments in simulation and on hardware. In simulation, two robots sustain a rally of 21 consecutive hits. In real-world tests with both machine-fed shuttles and human-robot rallies, the robot achieves outgoing shuttle speeds up to 19.1~m/s with a mean return landing distance of 4~m. Moreover, the prediction-free variant attains comparable performance to the EKF-based target-known policy. Overall, our approach enables dynamic yet precise goal striking in humanoid badminton and suggests a path toward more dynamics-critical whole-body interaction tasks.

CVJan 5, 2025Code
Facial Attractiveness Prediction in Live Streaming: A New Benchmark and Multi-modal Method

Hui Li, Xiaoyu Ren, Hongjiu Yu et al.

Facial attractiveness prediction (FAP) has long been an important computer vision task, which could be widely applied in live streaming for facial retouching, content recommendation, etc. However, previous FAP datasets are either small, closed-source, or lack diversity. Moreover, the corresponding FAP models exhibit limited generalization and adaptation ability. To overcome these limitations, in this paper we present LiveBeauty, the first large-scale live-specific FAP dataset, in a more challenging application scenario, i.e., live streaming. 10,000 face images are collected from a live streaming platform directly, with 200,000 corresponding attractiveness annotations obtained from a well-devised subjective experiment, making LiveBeauty the largest open-access FAP dataset in the challenging live scenario. Furthermore, a multi-modal FAP method is proposed to measure the facial attractiveness in live streaming. Specifically, we first extract holistic facial prior knowledge and multi-modal aesthetic semantic features via a Personalized Attractiveness Prior Module (PAPM) and a Multi-modal Attractiveness Encoder Module (MAEM), respectively, then integrate the extracted features through a Cross-Modal Fusion Module (CMFM). Extensive experiments conducted on both LiveBeauty and other open-source FAP datasets demonstrate that our proposed method achieves state-of-the-art performance. Dataset will be available soon.

RODec 20, 2021
Prioritized Hierarchical Compliance Control for Dual-Arm Robot Stable Clamping

Xiaoyu Ren, Liqun Huang, Mingguo Zhao

When a dual-arm robot clamps a rigid object in an environment for human beings, the environment or the collaborating human will impose incidental disturbance on the operated object or the robot arm, leading to clamping failure, damaging the robot even hurting the human. This research proposes a prioritized hierarchical compliance control to simultaneously deal with the two types of disturbances in the dual-arm robot clamping. First, we use hierarchical quadratic programming (HQP) to solve the robot inverse kinematics under the joint constraints and prioritize the compliance for the disturbance on the object over that on the robot arm. Second, we estimate the disturbance forces throughout the momentum observer with the F/T sensors and adopt admittance control to realize the compliances. Finally, we perform the verify experiments on a 14-DOF position-controlled dual-arm robot WalkerX, clamping a rigid object stably while realizing the compliance against the disturbances.

ROAug 9, 2021
Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Yan Xie, Jiajun Wang, Hao Dong et al.

Dynamic balancing under uncertain disturbances is important for a humanoid robot, which requires a good capability of coordinating the entire body redundancy to execute multi tasks. Whole-body control (WBC) based on hierarchical optimization has been generally accepted and utilized in torque-controlled robots. A good hierarchy is the prerequisite for WBC and can be predefined according to prior knowledge. However, the real-time computation would be problematic in the physical applications considering the computational complexity of WBC. For robots with proprioceptive actuation, the joint friction in gear reducer would also degrade the torque tracking performance. In our paper, a reasonable hierarchy of tasks and constraints is first customized for robot dynamic balancing. Then a real-time WBC is implemented via a computationally efficient WBC software. Such a method is solved on a modular master control system UBTMaster characterized by the real-time communication and powerful computing capability. After the joint friction being well covered by the model identification, extensive experiments on various balancing scenarios are conducted on a humanoid Walker3 with proprioceptive actuation. The robot shows an outstanding balance performance even under external impulses as well as the two feet of the robot suffering the inclination and shift disturbances independently. The results demonstrate that with the strict hierarchy, real-time computation and joint friction being handled carefully, the robot with proprioceptive actuation can manage the dynamic physical interactions with the unstructured environments well.