Feihong Liu

IV
h-index14
6papers
41citations
Novelty49%
AI Score43

6 Papers

IVMay 5, 2022
Invariant Content Synergistic Learning for Domain Generalization of Medical Image Segmentation

Yuxin Kang, Hansheng Li, Xuan Zhao et al.

While achieving remarkable success for medical image segmentation, deep convolution neural networks (DCNNs) often fail to maintain their robustness when confronting test data with the novel distribution. To address such a drawback, the inductive bias of DCNNs is recently well-recognized. Specifically, DCNNs exhibit an inductive bias towards image style (e.g., superficial texture) rather than invariant content (e.g., object shapes). In this paper, we propose a method, named Invariant Content Synergistic Learning (ICSL), to improve the generalization ability of DCNNs on unseen datasets by controlling the inductive bias. First, ICSL mixes the style of training instances to perturb the training distribution. That is to say, more diverse domains or styles would be made available for training DCNNs. Based on the perturbed distribution, we carefully design a dual-branches invariant content synergistic learning strategy to prevent style-biased predictions and focus more on the invariant content. Extensive experimental results on two typical medical image segmentation tasks show that our approach performs better than state-of-the-art domain generalization methods.

ROMay 12
World Action Models: The Next Frontier in Embodied AI

Siyin Wang, Junhao Shi, Zhaoyang Fu et al.

Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify predictive state modeling with action generation, targeting a joint distribution over future states and actions rather than actions alone. However, the literature remains fragmented across architectures, learning objectives, and application scenarios, lacking a unified conceptual framework. We formally define WAMs and disambiguate them from related concepts, and trace the foundations and early integration of VLA and world model research that gave rise to this paradigm. We organize existing methods into a structured taxonomy of Cascaded and Joint WAMs, with further subdivision by generation modality, conditioning mechanism, and action decoding strategy. We systematically analyze the data ecosystem fueling WAMs development, spanning robot teleoperation, portable human demonstrations, simulation, and internet-scale egocentric video, and synthesize emerging evaluation protocols organized around visual fidelity, physical commonsense, and action plausibility. Overall, this survey provides the first systematic account of the WAMs landscape, clarifies key architectural paradigms and their trade-offs, and identifies open challenges and future opportunities for this rapidly evolving field.

ROOct 27, 2025
RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Siyin Wang, Jinlan Fu, Feihong Liu et al.

Recent advances in Multimodal Large Language Models (MLLMs) have driven rapid progress in Vision-Language-Action (VLA) models for robotic manipulation. Although effective in many scenarios, current approaches largely rely on explicit instructions, whereas in real-world interactions, humans rarely issue instructions directly. Effective collaboration requires robots to infer user intentions proactively. In this work, we introduce cross-modal contextual instructions, a new setting where intent is derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands. To address this new setting, we present RoboOmni, a Perceiver-Thinker-Talker-Executor framework based on end-to-end omni-modal LLMs that unifies intention recognition, interaction confirmation, and action execution. RoboOmni fuses auditory and visual signals spatiotemporally for robust intention recognition, while supporting direct speech interaction. To address the absence of training data for proactive intention recognition in robotic manipulation, we build OmniAction, comprising 140k episodes, 5k+ speakers, 2.4k event sounds, 640 backgrounds, and six contextual instruction types. Experiments in simulation and real-world settings show that RoboOmni surpasses text- and ASR-based baselines in success rate, inference speed, intention recognition, and proactive assistance.

IVNov 11, 2021
Fast T2w/FLAIR MRI Acquisition by Optimal Sampling of Information Complementary to Pre-acquired T1w MRI

Junwei Yang, Xiao-Xin Li, Feihong Liu et al.

Recent studies on T1-assisted MRI reconstruction for under-sampled images of other modalities have demonstrated the potential of further accelerating MRI acquisition of other modalities. Most of the state-of-the-art approaches have achieved improvement through the development of network architectures for fixed under-sampling patterns, without fully exploiting the complementary information between modalities. Although existing under-sampling pattern learning algorithms can be simply modified to allow the fully-sampled T1-weighted MR image to assist the pattern learning, no significant improvement on the reconstruction task can be achieved. To this end, we propose an iterative framework to optimize the under-sampling pattern for MRI acquisition of another modality that can complement the fully-sampled T1-weighted MR image at different under-sampling factors, while jointly optimizing the T1-assisted MRI reconstruction model. Specifically, our proposed method exploits the difference of latent information between the two modalities for determining the sampling patterns that can maximize the assistance power of T1-weighted MR image in improving the MRI reconstruction. We have demonstrated superior performance of our learned under-sampling patterns on a public dataset, compared to commonly used under-sampling patterns and state-of-the-art methods that can jointly optimize both the reconstruction network and the under-sampling pattern, up to 8-fold under-sampling factor.

CVAug 17, 2019
Multi-Kernel Filtering for Nonstationary Noise: An Extension of Bilateral Filtering Using Image Context

Feihong Liu, Jun Feng, Pew-Thian Yap et al.

Bilateral filtering (BF) is one of the most classical denoising filters, however, the manually initialized filtering kernel hampers its adaptivity across images with various characteristics. To deal with image variation (i.e., non-stationary noise), in this paper, we propose multi-kernel filter (MKF) which adapts filtering kernels to specific image characteristics automatically. The design of MKF takes inspiration from adaptive mechanisms of human vision that make full use of information in a visual context. More specifically, for simulating the visual context and its adaptive function, we construct the image context based on which we simulate the contextual impact on filtering kernels. We first design a hierarchically clustering algorithm to generate a hierarchy of large to small coherent image patches, organized as a cluster tree, so that obtain multi-scale image representation. The leaf cluster and corresponding predecessor clusters are used to generate one of multiple range kernels that are capable of catering to image variation. At first, we design a hierarchically clustering framework to generate a hierarchy of large to small coherent image patches that organized as a cluster tree, so that obtain multi-scale image representation, i.e., the image context. Next, a leaf cluster is used to generate one of the multiple kernels, and two corresponding predecessor clusters are used to fine-tune the adopted kernel. Ultimately, the single spatially-invariant kernel in BF becomes multiple spatially-varying ones. We evaluate MKF on two public datasets, BSD300 and BrainWeb which are added integrally-varying noise and spatially-varying noise, respectively. Extensive experiments show that MKF outperforms state-of-the-art filters w.r.t. both mean absolute error and structural similarity.

IVJun 7, 2019
DeepBundle: Fiber Bundle Parcellation with Graph Convolution Neural Networks

Feihong Liu, Jun Feng, Geng Chen et al.

Parcellation of whole-brain tractography streamlines is an important step for tract-based analysis of brain white matter microstructure. Existing fiber parcellation approaches rely on accurate registration between an atlas and the tractograms of an individual, however, due to large individual differences, accurate registration is hard to guarantee in practice. To resolve this issue, we propose a novel deep learning method, called DeepBundle, for registration-free fiber parcellation. Our method utilizes graph convolution neural networks (GCNNs) to predict the parcellation label of each fiber tract. GCNNs are capable of extracting the geometric features of each fiber tract and harnessing the resulting features for accurate fiber parcellation and ultimately avoiding the use of atlases and any registration method. We evaluate DeepBundle using data from the Human Connectome Project. Experimental results demonstrate the advantages of DeepBundle and suggest that the geometric features extracted from each fiber tract can be used to effectively parcellate the fiber tracts.