Da Chen

CV
h-index21
34papers
944citations
Novelty51%
AI Score59

34 Papers

CVMay 28
Geodesics with Unified Tangent-constrained Priors and Curvature Regularization

Chong Di, Li Liu, Jinglin Zhang et al.

Curvature-penalized geodesic models have proven their effectiveness in image segmentation by computing globally optimal curves. Unfortunately, these models remain susceptible to shortcuts when delineating objects with complex shapes and image intensity distributions, as they lack mechanisms to enforce shape-aware tangent constraints. To address this limitation, we propose a unified geodesic framework that integrates tangent-constrained priors with curvature penalization. The key idea is to formulate tangent admissibility directly within the orientation-lifted space, where path tangents are restricted to spatially varying angular sectors derived from intrinsic shape representatives (ISR) such as skeletons or interior landmarks. This formulation gives rise to a family of tangent-constrained Finslerian metrics, extending the classical curvature-penalized geodesic models while enforcing mandatory tangent constraints. The resulting Hamilton-Jacobi-Bellman (HJB) partial differential equations (PDEs) admit efficient numerical solutions via variants of the fast marching method, preserving the single-pass computational complexity. Experiments on synthetic, natural, and medical images demonstrate that the proposed geodesic framework indeed improves robustness against weak boundaries and topological shortcuts, yielding segmentation results with enhanced shape fidelity compared to existing geodesic models.

CVJul 24, 2023Code
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts

Xiaofeng Mao, Yuefeng Chen, Yao Zhu et al. · nvidia

Practical object detection application can lose its effectiveness on image inputs with natural distribution shifts. This problem leads the research community to pay more attention on the robustness of detectors under Out-Of-Distribution (OOD) inputs. Existing works construct datasets to benchmark the detector's OOD robustness for a specific application scenario, e.g., Autonomous Driving. However, these datasets lack universality and are hard to benchmark general detectors built on common tasks such as COCO. To give a more comprehensive robustness assessment, we introduce COCO-O(ut-of-distribution), a test dataset based on COCO with 6 types of natural distribution shifts. COCO-O has a large distribution gap with training data and results in a significant 55.7% relative performance drop on a Faster R-CNN detector. We leverage COCO-O to conduct experiments on more than 100 modern object detectors to investigate if their improvements are credible or just over-fitting to the COCO test set. Unfortunately, most classic detectors in early years do not exhibit strong OOD generalization. We further study the robustness effect on recent breakthroughs of detector's architecture design, augmentation and pre-training techniques. Some empirical findings are revealed: 1) Compared with detection head or neck, backbone is the most important part for robustness; 2) An end-to-end detection transformer design brings no enhancement, and may even reduce robustness; 3) Large-scale foundation models have made a great leap on robust object detection. We hope our COCO-O could provide a rich testbed for robustness study of object detection. The dataset will be available at https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o.

LGNov 2, 2022
AI enhanced finite element multiscale modelling and structural uncertainty analysis of a functionally graded porous beam

Da Chen, Nima Emami, Shahed Rezaei et al.

The local geometrical randomness of metal foams brings complexities to the performance prediction of porous structures. Although the relative density is commonly deemed as the key factor, the stochasticity of internal cell sizes and shapes has an apparent effect on the porous structural behaviour but the corresponding measurement is challenging. To address this issue, we are aimed to develop an assessment strategy for efficiently examining the foam properties by combining multiscale modelling and deep learning. The multiscale modelling is based on the finite element (FE) simulation employing representative volume elements (RVEs) with random cellular morphologies, mimicking the typical features of closed-cell Aluminium foams. A deep learning database is constructed for training the designed convolutional neural networks (CNNs) to establish a direct link between the mesoscopic porosity characteristics and the effective Youngs modulus of foams. The error range of CNN models leads to an uncertain mechanical performance, which is further evaluated in a structural uncertainty analysis on the FG porous three-layer beam consisting of two thin high-density layers and a thick low-density one, where the imprecise CNN predicted moduli are represented as triangular fuzzy numbers in double parametric form. The uncertain beam bending deflections under a mid-span point load are calculated with the aid of Timoshenko beam theory and the Ritz method. Our findings suggest the success in training CNN models to estimate RVE modulus using images with an average error of 5.92%. The evaluation of FG porous structures can be significantly simplified with the proposed method and connects to the mesoscopic cellular morphologies without establishing the mechanics model for local foams.

CVAug 15, 2023
Identity-Consistent Aggregation for Video Object Detection

Chaorui Deng, Da Chen, Qi Wu

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame. Existing methods treat the temporal contexts obtained from different objects indiscriminately and ignore their different identities. While intuitively, aggregating local views of the same object in different frames may facilitate a better understanding of the object. Thus, in this paper, we aim to enable the model to focus on the identity-consistent temporal contexts of each object to obtain more comprehensive object representations and handle the rapid object appearance variations such as occlusion, motion blur, etc. However, realizing this goal on top of existing VID models faces low-efficiency problems due to their redundant region proposals and nonparallel frame-wise prediction manner. To aid this, we propose ClipVID, a VID model equipped with Identity-Consistent Aggregation (ICA) layers specifically designed for mining fine-grained and identity-consistent temporal contexts. It effectively reduces the redundancies through the set prediction strategy, making the ICA layers very efficient and further allowing us to design an architecture that makes parallel clip-wise predictions for the whole video clip. Extensive experimental results demonstrate the superiority of our method: a state-of-the-art (SOTA) performance (84.7% mAP) on the ImageNet VID dataset while running at a speed about 7x faster (39.3 fps) than previous SOTAs.

AIMay 26
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax, Aili Chen, Aonian Li et al.

We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the M2 series rests on three components: (i) agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork, each grounded in an executable workspace and an artifact-aligned reward; (ii) Forge, a scalable agent-native RL system that adapts to long-horizon agent trajectories, paired with windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clean training-inference-agent decoupling that supports both white-box and black-box agents; (iii) the latest M2.7 checkpoint takes an early step toward self-evolution -- autonomously debugging training runs and modifying its own scaffold. Across M2 through M2.7, this combination translates a mini-activation footprint into frontier-tier performance on agentic coding, deep search, office-task, and reasoning benchmarks.

CLApr 19Code
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization

Xiaoyong Mei, Tingting Zuo, Da Chen et al.

Multi-role dialogue summarization requires modeling complex interactions among multiple speakers while preserving role-specific information and factual consistency. However, most existing methods optimize for automatic metrics such as ROUGE and BERTScore, which favor surface-level imitation of references rather than genuine gains in faithfulness or alignment with human preferences. We propose a novel framework that couples explicit cognitive-style reasoning with reward-based optimization for multi-role dialogue summarization. Our method first distills structured reasoning traces (e.g., step-by-step inferences and intermediate reflections) from a large teacher model and uses them as auxiliary supervision to initialize a reasoning-aware summarizer via staged supervised fine-tuning. It then applies GRPO with a dual-principle reward that blends metric-based signals with human-aligned criteria targeting key information coverage, implicit inference, factual faithfulness, and conciseness. Experiments on multilingual multi-role dialogue benchmarks show that our method matches strong baselines on ROUGE and BERTScore. Specifically, results on CSDS confirm the framework's stability in semantic consistency, while in-depth analysis on SAMSum demonstrates clear gains in factual faithfulness and model-based preference alignment. These findings underscore the value of reasoning-aware and preference-aware training for reliable dialogue summarization. Checkpoints and datasets are available at https://huggingface.co/collections/NebulaPixel/summorchestra-multirole-summary.

CVAug 15, 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

Chaorui Deng, Qi Chen, Pengda Qin et al.

In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e.g., CLIP) by adapting them to the video domain. A critical problem for them is how to effectively capture the rich semantics inside the video using the image encoder of CLIP. To tackle this, state-of-the-art methods adopt complex cross-modal modeling techniques to fuse the text information into video frame representations, which, however, incurs severe efficiency issues in large-scale retrieval systems as the video representations must be recomputed online for every text query. In this paper, we discard this problematic cross-modal fusion process and aim to learn semantically-enhanced representations purely from the video, so that the video representations can be computed offline and reused for different texts. Concretely, we first introduce a spatial-temporal "Prompt Cube" into the CLIP image encoder and iteratively switch it within the encoder layers to efficiently incorporate the global video semantics into frame representations. We then propose to apply an auxiliary video captioning objective to train the frame representations, which facilitates the learning of detailed video semantics by providing fine-grained guidance in the semantic space. With a naive temporal fusion strategy (i.e., mean-pooling) on the enhanced frame representations, we obtain state-of-the-art performances on three benchmark datasets, i.e., MSR-VTT, MSVD, and LSMDC.

CVMay 24
Interpretability Transfer from Language to Vision via Sparse Autoencoders

Alexey Kravets, Da Li, Chuan Li et al.

Recent advances in language model interpretability using sparse autoencoders (SAEs) have yet to effectively translate to the visual domain, mainly due to the difficulty and ambiguity of labeling visual concepts. In this paper, we introduce Visual Interpretability via SAE Transfer Alignment (VISTA), a framework that transfers interpretability from language to vision in a LLaVA-style vision-language model by constraining a visual projector to map visual tokens into an LLM's pre-existing, labeled textual SAE space. This approach enables visual interpretability without training dedicated vision SAEs. By regularizing the projector using the LLM's SAE reconstruction loss, VISTA achieves a threefold increase in the matching rate, which measures how accurately the most activating textual concepts in the SAE space correspond to semantic elements in the image. Using this framework, we further analyze spatial localization properties of different vision encoders and show that DINOv2 features have stronger localization abilities than other encoders. Leveraging this precision, we validate VISTA's cross-modal alignment through fine-grained, localized concept interventions, where specific objects are removed or replaced in the model's perception while preserving the surrounding scene. This results in improvements of 35% in object removal and 47% in object replacement tasks over vision-only baselines, providing causal evidence that visual tokens inhabit the text SAE manifold. These contributions are validated across multiple LLM architectures.

NEMay 8
Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts

Zixuan Huang, Da Chen, Kecheng Huang et al.

Generating high-performance GPU kernels remains challenging due to the need for both correctness and hardware-aware optimization. While large language models (LLMs) show promise in code generation, they often fail to produce kernels that are both correct and efficient. We propose Kernel Foundry, a diagnosis-driven evolutionary framework for automatic GPU kernel optimization. Our method combines expert-guided, retrieval-augmented initialization with a multi-island evolutionary search, where candidate kernels are iteratively refined using structured diagnostic feedback. A centralized experience library accumulates reusable optimization knowledge to guide subsequent evolution, while explicit mechanisms prevent cheating behaviors that bypass kernel-level computation. Experiments on KernelBench show that our method consistently improves both correctness and performance over strong baselines, achieving up to 100% correctness on Level~2.

CVSep 8, 2023
Grouping Boundary Proposals for Fast Interactive Image Segmentation

Li Liu, Da Chen, Minglei Shu et al.

Geodesic models are known as an efficient tool for solving various image segmentation problems. Most of existing approaches only exploit local pointwise image features to track geodesic paths for delineating the objective boundaries. However, such a segmentation strategy cannot take into account the connectivity of the image edge features, increasing the risk of shortcut problem, especially in the case of complicated scenario. In this work, we introduce a new image segmentation model based on the minimal geodesic framework in conjunction with an adaptive cut-based circular optimal path computation scheme and a graph-based boundary proposals grouping scheme. Specifically, the adaptive cut can disconnect the image domain such that the target contours are imposed to pass through this cut only once. The boundary proposals are comprised of precomputed image edge segments, providing the connectivity information for our segmentation model. These boundary proposals are then incorporated into the proposed image segmentation model, such that the target segmentation contours are made up of a set of selected boundary proposals and the corresponding geodesic paths linking them. Experimental results show that the proposed model indeed outperforms state-of-the-art minimal paths-based image segmentation approaches.

CLJun 16, 2025Code
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

MiniMax, Aili Chen, Aonian Li et al.

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition to M1's inherent efficiency advantage for RL training, we propose CISPO, a novel RL algorithm to further enhance RL efficiency. CISPO clips importance sampling weights rather than token updates, outperforming other competitive RL variants. Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on 512 H800 GPUs to complete in only three weeks, with a rental cost of just $534,700. We release two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively, where the 40K model represents an intermediate phase of the 80K training. Experiments on standard benchmarks show that our models are comparable or superior to strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, with particular strengths in complex software engineering, tool utilization, and long-context tasks. We publicly release MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1.

ARApr 18
SegSEM: Enabling and Enhancing SAM2 for SEM Contour Extraction

Da Chen, Guangyu Hu, Kaihong Xu et al.

Extracting high-fidelity 2D contours from Scanning Electron Microscope (SEM) images is critical for calibrating Optical Proximity Correction (OPC) models. While foundation models like Segment Anything 2 (SAM2) are promising, adapting them to specialized domains with scarce annotated data is a major challenge. This paper presents a case study on adapting SAM2 for SEM contour extraction in a few-shot setting. We propose SegSEM, a framework built on two principles: a data-efficient fine-tuning strategy that adapts by selectively training only the model's encoders, and a robust hybrid architecture integrating a traditional algorithm as a confidence-aware fallback. Using a small dataset of 60 production images, our experiments demonstrate this methodology's viability. The primary contribution is a methodology for leveraging foundation models in data-constrained industrial applications.

LGDec 23, 2019Code
Data-Free Adversarial Distillation

Gongfan Fang, Jie Song, Chengchao Shen et al.

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challenging problem and propose a novel adversarial distillation mechanism to craft a compact student model without any real-world data. We introduce a model discrepancy to quantificationally measure the difference between student and teacher models and construct an optimizable upper bound. In our work, the student and the teacher jointly act the role of the discriminator to reduce this discrepancy, when a generator adversarially produces some "hard samples" to enlarge it. Extensive experiments demonstrate that the proposed data-free method yields comparable performance to existing data-driven methods. More strikingly, our approach can be directly extended to semantic segmentation, which is more complicated than classification, and our approach achieves state-of-the-art results. Code and pretrained models are available at https://github.com/VainF/Data-Free-Adversarial-Distillation.

CVNov 14, 2019Code
Self-Supervised Learning For Few-Shot Image Classification

Da Chen, Yuefeng Chen, Yuhong Li et al.

Few-shot image classification aims to classify unseen classes with limited labelled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta-learning becomes an essential component and can largely affect the performance in practice. To this end, most of the existing methods highly rely on the efficient embedding network. Due to the limited labelled data, the scale of embedding network is constrained under a supervised learning(SL) manner which becomes a bottleneck of the few-shot learning methods. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB) and achieve better performance over baselines. Tests on four datasets in cross-domain few-shot learning classification show that the proposed method achieves state-of-the-art results and further prove the robustness of the proposed model. Our code is available at \hyperref[https://github.com/phecy/SSL-FEW-SHOT.]{https://github.com/phecy/SSL-FEW-SHOT.}

CLJan 14, 2025
MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax, Aonian Li, Bangwei Gong et al.

We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.

CLOct 8, 2025
ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

Jingbiao Mei, Mingsheng Sun, Jinghong Chen et al.

Hateful memes have emerged as a particularly challenging form of online abuse, motivating the development of automated detection systems. Most prior approaches rely on direct detection, producing only binary predictions. Such models fail to provide the context and explanations that real-world moderation requires. Recent Explain-then-Detect approaches, using Chain-of-Thought prompting or LMM agents, perform worse than simple SFT baselines, and even advanced post-training methods such as GRPO fail to close the gap. Our analysis identifies two key issues of such systems: important policy-relevant cues such as targets and attack types are not hypothesized by the model as a likely explanation; and the binary reward signal is insufficient to guide reasoning. To address these challenges, we propose ExPO-HM (Explain-then-Detect Policy Optimization for Hateful Memes), inspired by the training and evaluation process of human annotators. ExPO-HM combines SFT warmup, GRPO with curriculum learning, and Conditional Decision Entropy (CDE) as both metric and reward for reasoning quality. Across three hateful meme benchmarks, ExPO-HM achieves state-of-the-art performance on binary detection, fine-grained classification, and reasoning quality, with up to 15\% and 17\% F1 improvement over the GRPO and DPO baselines, respectively. By moving hateful meme detection from simple binary alarms to explanation-driven detection, ExPO-HM provides accurate, interpretable, and actionable moderation support.

CVAug 5, 2025
Neovascularization Segmentation via a Multilateral Interaction-Enhanced Graph Convolutional Network

Tao Chen, Dan Zhang, Da Chen et al.

Choroidal neovascularization (CNV), a primary characteristic of wet age-related macular degeneration (wet AMD), represents a leading cause of blindness worldwide. In clinical practice, optical coherence tomography angiography (OCTA) is commonly used for studying CNV-related pathological changes, due to its micron-level resolution and non-invasive nature. Thus, accurate segmentation of CNV regions and vessels in OCTA images is crucial for clinical assessment of wet AMD. However, challenges existed due to irregular CNV shapes and imaging limitations like projection artifacts, noises and boundary blurring. Moreover, the lack of publicly available datasets constraints the CNV analysis. To address these challenges, this paper constructs the first publicly accessible CNV dataset (CNVSeg), and proposes a novel multilateral graph convolutional interaction-enhanced CNV segmentation network (MTG-Net). This network integrates both region and vessel morphological information, exploring semantic and geometric duality constraints within the graph domain. Specifically, MTG-Net consists of a multi-task framework and two graph-based cross-task modules: Multilateral Interaction Graph Reasoning (MIGR) and Multilateral Reinforcement Graph Reasoning (MRGR). The multi-task framework encodes rich geometric features of lesion shapes and surfaces, decoupling the image into three task-specific feature maps. MIGR and MRGR iteratively reason about higher-order relationships across tasks through a graph mechanism, enabling complementary optimization for task-specific objectives. Additionally, an uncertainty-weighted loss is proposed to mitigate the impact of artifacts and noise on segmentation accuracy. Experimental results demonstrate that MTG-Net outperforms existing methods, achieving a Dice socre of 87.21\% for region segmentation and 88.12\% for vessel segmentation.

CVJul 28, 2025
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting

Alexey Kravets, Da Chen, Vinay P. Namboodiri

CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method.

CVJun 21, 2025
Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking

Chong Di, Shuwang Zhou, Da Chen et al.

The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenarios, they often suffer from computational inefficiency and heavily rely on a prescribed prior to fit the target elongated shapes. We propose a novel framework that casts segment-wise tracking as a Markov Decision Process (MDP), enabling a reinforcement learning approach. Our method leverages Q-Learning to dynamically explore a graph of segments, computing edge weights on-demand and adaptively expanding the search space. This strategy avoids the high cost of a pre-computed graph and proves robust to incomplete initial information. Experimental reuslts on typical tubular structure datasets demonstrate that our method significantly outperforms state-of-the-art point-wise and segment-wise approaches. The proposed method effectively handles complex topologies and maintains global path coherence without depending on extensive prior structural knowledge.

IVJun 27, 2024
GAPNet: Granularity Attention Network with Anatomy-Prior-Constraint for Carotid Artery Segmentation

Lin Zhang, Chenggang Lu, Xin-yang Shi et al.

Atherosclerosis is a chronic, progressive disease that primarily affects the arterial walls. It is one of the major causes of cardiovascular disease. Magnetic Resonance (MR) black-blood vessel wall imaging (BB-VWI) offers crucial insights into vascular disease diagnosis by clearly visualizing vascular structures. However, the complex anatomy of the neck poses challenges in distinguishing the carotid artery (CA) from surrounding structures, especially with changes like atherosclerosis. In order to address these issues, we propose GAPNet, which is a consisting of a novel geometric prior deduced from.

IVJun 1, 2024
DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation

Jiong Zhang, Qihang Xie, Lei Mou et al.

Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the gold standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main trunks and branches are crucial for physicians to accurately quantify diseases. However, achieving accurate CA segmentation in DSA sequences remains a challenging task due to small vessels with low contrast, and ambiguity between vessels and residual skull structures. Moreover, the lack of publicly available datasets limits exploration in the field. In this paper, we introduce a DSA Sequence-based Cerebral Artery segmentation dataset (DSCA), the publicly accessible dataset designed specifically for pixel-level semantic segmentation of CAs. Additionally, we propose DSANet, a spatio-temporal network for CA segmentation in DSA sequences. Unlike existing DSA segmentation methods that focus only on a single frame, the proposed DSANet introduces a separate temporal encoding branch to capture dynamic vessel details across multiple frames. To enhance small vessel segmentation and improve vessel connectivity, we design a novel TemporalFormer module to capture global context and correlations among sequential frames. Furthermore, we develop a Spatio-Temporal Fusion (STF) module to effectively integrate spatial and temporal features from the encoder. Extensive experiments demonstrate that DSANet outperforms other state-of-the-art methods in CA segmentation, achieving a Dice of 0.9033.

CVNov 1, 2021
Geodesic Models with Convexity Shape Prior

Da Chen, Jean-Marie Mirebeau, Minglei Shu et al.

The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit image features in conjunction with geometric regularization terms, such as Euclidean curve length or curvature-penalized length, for computing geodesic curves. In this paper, we take into account a more complicated problem: finding curvature-penalized geodesic paths with a convexity shape prior. We establish new geodesic models relying on the strategy of orientation-lifting, by which a planar curve can be mapped to an high-dimensional orientation-dependent space. The convexity shape prior serves as a constraint for the construction of local geodesic metrics encoding a particular curvature constraint. Then the geodesic distances and the corresponding closed geodesic paths in the orientation-lifted space can be efficiently computed through state-of-the-art Hamiltonian fast marching method. In addition, we apply the proposed geodesic models to the active contours, leading to efficient interactive image segmentation algorithms that preserve the advantages of convexity shape prior and curvature penalization.

CVAug 16, 2020
Geodesic Paths for Image Segmentation with Implicit Region-based Homogeneity Enhancement

Da Chen, Jian Zhu, Xinxin Zhang et al.

Minimal paths are regarded as a powerful and efficient tool for boundary detection and image segmentation due to its global optimality and the well-established numerical solutions such as fast marching method. In this paper, we introduce a flexible interactive image segmentation model based on the Eikonal partial differential equation (PDE) framework in conjunction with region-based homogeneity enhancement. A key ingredient in the introduced model is the construction of local geodesic metrics, which are capable of integrating anisotropic and asymmetric edge features, implicit region-based homogeneity features and/or curvature regularization. The incorporation of the region-based homogeneity features into the metrics considered relies on an implicit representation of these features, which is one of the contributions of this work. Moreover, we also introduce a way to build simple closed contours as the concatenation of two disjoint open curves. Experimental results prove that the proposed model indeed outperforms state-of-the-art minimal paths-based image segmentation approaches.

CVJun 14, 2020
A Generalized Asymmetric Dual-front Model for Active Contours and Image Segmentation

Da Chen, Jack Spencer, Jean-Marie Mirebeau et al.

The Voronoi diagram-based dual-front active contour models are known as a powerful and efficient way for addressing the image segmentation and domain partitioning problems. In the basic formulation of the dual-front models, the evolving contours can be considered as the interfaces of adjacent Voronoi regions. Among these dual-front models, a crucial ingredient is regarded as the geodesic metrics by which the geodesic distances and the corresponding Voronoi diagram can be estimated. In this paper, we introduce a type of asymmetric quadratic metrics dual-front model. The metrics considered are built by the integration of the image features and a vector field derived from the evolving contours. The use of the asymmetry enhancement can reduce the risk of contour shortcut or leakage problems especially when the initial contours are far away from the target boundaries or the images have complicated intensity distributions. Moreover, the proposed dual-front model can be applied for image segmentation in conjunction with various region-based homogeneity terms. The numerical experiments on both synthetic and real images show that the proposed dual-front model indeed achieves encouraging results.

CVMar 8, 2020
Trajectory Grouping with Curvature Regularization for Tubular Structure Tracking

Li Liu, Da Chen, Minglei Shu et al.

Tubular structure tracking is a crucial task in the fields of computer vision and medical image analysis. The minimal paths-based approaches have exhibited their strong ability in tracing tubular structures, by which a tubular structure can be naturally modeled as a minimal geodesic path computed with a suitable geodesic metric. However, existing minimal paths-based tracing approaches still suffer from difficulties such as the shortcuts and short branches combination problems, especially when dealing with the images involving complicated tubular tree structures or background. In this paper, we introduce a new minimal paths-based model for minimally interactive tubular structure centerline extraction in conjunction with a perceptual grouping scheme. Basically, we take into account the prescribed tubular trajectories and curvature-penalized geodesic paths to seek suitable shortest paths. The proposed approach can benefit from the local smoothness prior on tubular structures and the global optimality of the used graph-based path searching scheme. Experimental results on both synthetic and real images prove that the proposed model indeed obtains outperformance comparing with the state-of-the-art minimal paths-based tubular structure tracing algorithms.

CVDec 20, 2019
A Region-based Randers Geodesic Approach for Image Segmentation

Da Chen, Jean-Marie Mirebeau, Huazhong Shu et al.

The geodesic model based on the eikonal partial differential equation (PDE) has served as a fundamental tool for the applications of image segmentation and boundary detection in the past two decades. However, the existing approaches commonly only exploit the image edge-based features for computing minimal geodesic paths, potentially limiting their performance in complicated segmentation situations. In this paper, we introduce a new variational image segmentation model based on the minimal geodesic path framework and the eikonal PDE, where the region-based appearance term that defines then regional homogeneity features can be taken into account for estimating the associated minimal geodesic paths. This is done by constructing a Randers geodesic metric interpretation of the region-based active contour energy functional. As a result, the minimization of the active contour energy functional is transformed into finding the solution to the Randers eikonal PDE. We also suggest a practical interactive image segmentation strategy, where the target boundary can be delineated by the concatenation of several piecewise geodesic paths. We invoke the Finsler variant of the fast marching method to estimate the geodesic distance map, yielding an efficient implementation of the proposed region-based Randers geodesic model for image segmentation. Experimental results on both synthetic and real images exhibit that our model indeed achieves encouraging segmentation performance.

CVDec 18, 2019
Semantic Regularization: Improve Few-shot Image Classification by Reducing Meta Shift

Da Chen, Yongliang Yang, Zunlei Feng et al.

Few-shot image classification requires the classifier to robustly cope with unseen classes even if there are only a few samples for each class. Recent advances benefit from the meta-learning process where episodic tasks are formed to train a model that can adapt to class change. However, these task sare independent to each other and existing works mainly rely on limited samples of individual support set in a single meta task. This strategy leads to severe meta shift issues across multiple tasks, meaning the learned prototypes or class descriptors are not stable as each task only involves their own support set. To avoid this problem, we propose a concise Semantic RegularizationNetwork to learn a common semantic space under the framework of meta-learning. In this space, all class descriptors can be regularized by the learned semantic basis, which can effectively solve the meta shift problem. The key is to train a class encoder and decoder structure that can encode the sample embedding features into the semantic domain with trained semantic basis, and generate a more stable and general class descriptor from the decoder. We evaluate our work by extensive comparisons with previous methods on three benchmark datasets (MiniImageNet, TieredImageNet, and CUB). The results show that the semantic regularization module improves performance by 4%-7% over the baseline method, and achieves competitive results over the current state-of-the-art models.

CVJul 23, 2019
From Active Contours to Minimal Geodesic Paths: New Solutions to Active Contours Problems by Eikonal Equations

Da Chen, Laurent D. Cohen

In this chapter, we give an overview of part of our previous work based on the minimal path framework and the Eikonal partial differential equation (PDE). We show that by designing adequate Riemannian and Randers geodesic metrics the minimal paths can be utilized to search for solutions to almost all of the active contour problems and to the Euler-Mumford elastica problem, which allows to blend the advantages from minimal geodesic paths and those original approaches, i.e. the active contours and elastica curves. The proposed minimal path-based models can be applied to deal with a broad variety of image analysis tasks such as boundary detection, image segmentation and tubular structure extraction. The numerical implementations for the computation of minimal paths are known to be quite efficient thanks to the Eikonal solvers such as the Finsler variant of the fast marching method.

LGJan 20, 2019
Deep Features Analysis with Attention Networks

Shipeng Xie, Da Chen, Rong Zhang et al.

Deep neural network models have recently draw lots of attention, as it consistently produce impressive results in many computer vision tasks such as image classification, object detection, etc. However, interpreting such model and show the reason why it performs quite well becomes a challenging question. In this paper, we propose a novel method to interpret the neural network models with attention mechanism. Inspired by the heatmap visualization, we analyze the relation between classification accuracy with the attention based heatmap. An improved attention based method is also included and illustrate that a better classifier can be interpreted by the attention based heatmap.

CVSep 21, 2018
Minimal Paths for Tubular Structure Segmentation with Coherence Penalty and Adaptive Anisotropy

Da Chen, Jiong Zhang, Laurent D. Cohen

The minimal path method has proven to be particularly useful and efficient in tubular structure segmentation applications. In this paper, we propose a new minimal path model associated with a dynamic Riemannian metric embedded with an appearance feature coherence penalty and an adaptive anisotropy enhancement term. The features that characterize the appearance and anisotropy properties of a tubular structure are extracted through the associated orientation score. The proposed dynamic Riemannian metric is updated in the course of the geodesic distance computation carried out by the efficient single-pass fast marching method. Compared to state-of-the-art minimal path models, the proposed minimal path model is able to extract the desired tubular structures from a complicated vessel tree structure. In addition, we propose an efficient prior path-based method to search for vessel radius value at each centerline position of the target. Finally, we perform the numerical experiments on both synthetic and real images. The quantitive validation is carried out on retinal vessel images. The results indicate that the proposed model indeed achieves a promising performance.

CGOct 17, 2017
A New Coherence-Penalized Minimal Path Model with Application to Retinal Vessel Centerline Delineation

Da Chen, Laurent D. Cohen

In this paper, we propose a new minimal path model for minimally interactive retinal vessel centerline extraction. The main contribution lies at the construction of a novel coherence-penalized Riemannian metric in a lifted space, dependently of the local geometry of tubularity and an external scalar-valued reference feature map. The globally minimizing curves associated to the proposed metric favour to pass through a set of retinal vessel segments with low variations of the feature map, thus can avoid the short branches combination problem and shortcut problem, commonly suffered by the existing minimal path models in the application of retinal imaging. We validate our model on a series of retinal vessel patches obtained from the DRIVE and IOSTAR datasets, showing that our model indeed get promising results.

CVApr 19, 2017
Learn to Model Motion from Blurry Footages

Wenbin Li, Da Chen, Zhihan Lv et al.

It is difficult to recover the motion field from a real-world footage given a mixture of camera shake and other photometric effects. In this paper we propose a hybrid framework by interleaving a Convolutional Neural Network (CNN) and a traditional optical flow energy. We first conduct a CNN architecture using a novel learnable directional filtering layer. Such layer encodes the angle and distance similarity matrix between blur and camera motion, which is able to enhance the blur features of the camera-shake footages. The proposed CNNs are then integrated into an iterative optical flow framework, which enable the capability of modelling and solving both the blind deconvolution and the optical flow estimation problems simultaneously. Our framework is trained end-to-end on a synthetic dataset and yields competitive precision and performance against the state-of-the-art approaches.

CGDec 1, 2016
Global Minimum for a Finsler Elastica Minimal Path Approach

Da Chen, Jean-Marie Mirebeau, Laurent D. Cohen

In this paper, we propose a novel curvature-penalized minimal path model via an orientation-lifted Finsler metric and the Euler elastica curve. The original minimal path model computes the globally minimal geodesic by solving an Eikonal partial differential equation (PDE). Essentially, this first-order model is unable to penalize curvature which is related to the path rigidity property in the classical active contour models. To solve this problem, we present an Eikonal PDE-based Finsler elastica minimal path approach to address the curvature-penalized geodesic energy minimization problem. We were successful at adding the curvature penalization to the classical geodesic energy. The basic idea of this work is to interpret the Euler elastica bending energy via a novel Finsler elastica metric that embeds a curvature penalty. This metric is non-Riemannian, anisotropic and asymmetric, and is defined over an orientation-lifted space by adding to the image domain the orientation as an extra space dimension. Based on this orientation lifting, the proposed minimal path model can benefit from both the curvature and orientation of the paths. Thanks to the fast marching method, the global minimum of the curvature-penalized geodesic energy can be computed efficiently. We introduce two anisotropic image data-driven speed functions that are computed by steerable filters. Based on these orientation-dependent speed functions, we can apply the proposed Finsler elastica minimal path model to the applications of closed contour detection, perceptual grouping and tubular structure extraction. Numerical experiments on both synthetic and real images show that these applications of the proposed model indeed obtain promising results.

CVSep 7, 2016
Dense Motion Estimation for Smoke

Da Chen, Wenbin Li, Peter Hall

Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of smoke compared to other dense motion estimation algorithms, including state of the art and neural network approaches. The key to our contribution is to use skeletal flow, without explicit point matching, to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this paper we describe our algorithm in greater detail, and provide experimental evidence to support our claims.