Feng Dong

LG
15papers
88citations
Novelty44%
AI Score53

15 Papers

CVJun 3
Implicit Fuzzification via Bounded Noise Injection for Robust Medical Image Segmentation

Bisheng Tang, Zhangfeng Ma, Chuchu Zhai et al.

Image segmentation remains fundamentally limited by boundary ambiguity arising from sampling-induced information loss and inherent uncertainty in pixel-wise labeling. Although encoder-decoder architectures such as U-Net achieve strong performance, they often produce overconfident predictions that fail to capture transition-region ambiguity. To address this issue, we propose \textbf{NoiseUNet}, a simple yet effective framework that injects bounded perturbations into skip connections to regularize cross-scale feature fusion. This mechanism enforces robustness to local feature variations and promotes boundary-aware representations. Theoretically, the perturbation induces an implicit fuzzification effect, yielding soft, data-driven memberships without requiring explicit fuzzy modeling. We further introduce \textbf{ThyR}, a real-world thyroid ultrasound dataset with inherently ambiguous boundaries. Experiments demonstrate that NoiseUNet consistently improves both segmentation accuracy and boundary fidelity.

CVMar 1, 2023
Underwater Camouflage Object Detection Dataset

Feng Dong, Jinchao Zhu

We have made a dataset of camouflage object detection mainly for complex seabed scenes, and named it UnderWater RGB&Sonar,or UW-RS for short. The UW-RS dataset contains a total of 1972 image data. The dataset mainly consists of two parts, namely underwater optical data part (UW-R dataset) and underwater sonar data part (UW-S dataset).

LGApr 1, 2022
DAG-WGAN: Causal Structure Learning With Wasserstein Generative Adversarial Networks

Hristo Petkov, Colin Hanley, Feng Dong

The combinatorial search space presents a significant challenge to learning causality from data. Recently, the problem has been formulated into a continuous optimization framework with an acyclicity constraint, allowing for the exploration of deep generative models to better capture data sample distributions and support the discovery of Directed Acyclic Graphs (DAGs) that faithfully represent the underlying data distribution. However, so far no study has investigated the use of Wasserstein distance for causal structure learning via generative models. This paper proposes a new model named DAG-WGAN, which combines the Wasserstein-based adversarial loss, an auto-encoder architecture together with an acyclicity constraint. DAG-WGAN simultaneously learns causal structures and improves its data generation capability by leveraging the strength from the Wasserstein distance metric. Compared with other models, it scales well and handles both continuous and discrete data. Our experiments have evaluated DAG-WGAN against the state-of-the-art and demonstrated its good performance.

LGApr 13
FlatASCEND: Autoregressive Clinical Sequence Generation with Continuous Time Prediction and Association-Based Pharmacological Testing

Chris Sainsbury, Feng Dong, Andreas Karwath

Autoregressive models can predict clinical events, but generating patient-conditioned multi-step trajectories that respond to intervention tokens and testing whether those responses preserve known pharmacological associations has received limited attention. We present FlatASCEND, a 14.5M-parameter autoregressive clinical sequence model using flat composite tokens and a zero-inflated log-normal time head. Standard distributional metrics (Jaccard 0.889-0.954) do not distinguish FlatASCEND from trivial baselines; the model's value lies in conditional generation from patient-specific prefixes. A prompt-shuffle ablation shows patient-specific conditioning amplifies mechanistic pharmacological effects (2.0-2.2x for steroid to glucose, diuretic to potassium) while leaving confounding-driven associations unchanged (0.9x for insulin to glucose). An incident-user framework assesses directional consistency against prior pharmacological knowledge on MIMIC-IV (N=500 per comparison): 4/10 recover correct mechanistic directions, 2 reproduce treatment-context associations, 4 are incorrect (9/10 significant, Wilcoxon p<0.05). This pattern - partial recovery under residual confounding - is consistent with learned observational associations without causal distinction. Direct preference optimisation with surrogate reward destroys all correct associations (3/3 to 0/3), illustrating reward exploitation when reward and evaluation share an outcome domain. Generative evidence is strongest for short-horizon ICU data; outpatient temporal fidelity is weaker (median 10 vs 154 days on INSPECT), and zero-shot cross-site transfer degrades without adaptation.

CVNov 11, 2022
Interactive Context-Aware Network for RGB-T Salient Object Detection

Yuxuan Wang, Feng Dong, Jinchao Zhu

Salient object detection (SOD) focuses on distinguishing the most conspicuous objects in the scene. However, most related works are based on RGB images, which lose massive useful information. Accordingly, with the maturity of thermal technology, RGB-T (RGB-Thermal) multi-modality tasks attain more and more attention. Thermal infrared images carry important information which can be used to improve the accuracy of SOD prediction. To accomplish it, the methods to integrate multi-modal information and suppress noises are critical. In this paper, we propose a novel network called Interactive Context-Aware Network (ICANet). It contains three modules that can effectively perform the cross-modal and cross-scale fusions. We design a Hybrid Feature Fusion (HFF) module to integrate the features of two modalities, which utilizes two types of feature extraction. The Multi-Scale Attention Reinforcement (MSAR) and Upper Fusion (UF) blocks are responsible for the cross-scale fusion that converges different levels of features and generate the prediction maps. We also raise a novel Context-Aware Multi-Supervised Network (CAMSNet) to calculate the content loss between the prediction and the ground truth (GT). Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.

LGApr 13
Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction

Chris Sainsbury, Feng Dong, Andreas Karwath

Sparse autoencoders (SAEs) have been applied to large language models and protein language models, but not systematically to electronic health record (EHR) foundation models. We train TopK SAEs on FlatASCEND, a 14.5-million-parameter autoregressive clinical sequence model, at all 10 residual stream extraction points on INSPECT (outpatient) and MIMIC-IV (ICU). SAE decomposition reveals progressive abstraction across transformer depth: layer-0 features are near-perfect token detectors (45.7% singleton), while layer-6 features span approximately 30 token types across multiple clinical categories (0.5% singleton). Under full-sequence simple linear probes, SAE features outperform dense representations for discrete event prediction (mortality) while dense representations outperform for continuous magnitude prediction (length of stay) - a probe-level representational phenomenon that does not extend to clinically relevant leakage-safe windows, where dense representations match or exceed SAE features across all tested settings (eICU-CRD 48-hour AUC: SAE 0.871 versus dense 0.880; base model zero-shot, SAE dictionaries trained on eICU activations; MIMIC-IV: 0.836 versus 0.914; INSPECT 1-year/3-year: 0.697 versus 0.800). A delta-mode intervention method reduces SAE perturbation noise by 86x, enabling cleaner feature-level experiments, though the resulting perturbation effects are larger than random controls in 3 of 4 conditions but not formally significant. Feature reproducibility across random seeds is 21%, and individual features should be interpreted as illustrative rather than stable.

CVJun 4, 2024Code
Progressive Confident Masking Attention Network for Audio-Visual Segmentation

Yuxuan Wang, Jinchao Zhu, Feng Dong et al.

Audio and visual signals typically occur simultaneously, and humans possess an innate ability to correlate and synchronize information from these two modalities. Recently, a challenging problem known as Audio-Visual Segmentation (AVS) has emerged, intending to produce segmentation maps for sounding objects within a scene. However, the methods proposed so far have not sufficiently integrated audio and visual information, and the computational costs have been extremely high. Additionally, the outputs of different stages have not been fully utilized. To facilitate this research, we introduce a novel Progressive Confident Masking Attention Network (PMCANet). It leverages attention mechanisms to uncover the intrinsic correlations between audio signals and visual frames. Furthermore, we design an efficient and effective cross-attention module to enhance semantic perception by selecting query tokens. This selection is determined through confidence-driven units based on the network's multi-stage predictive outputs. Experiments demonstrate that our network outperforms other AVS methods while requiring less computational resources. The code is available at: https://github.com/PrettyPlate/PCMANet.

LGJun 3, 2022
Causality Learning With Wasserstein Generative Adversarial Networks

Hristo Petkov, Colin Hanley, Feng Dong

Conventional methods for causal structure learning from data face significant challenges due to combinatorial search space. Recently, the problem has been formulated into a continuous optimization framework with an acyclicity constraint to learn Directed Acyclic Graphs (DAGs). Such a framework allows the utilization of deep generative models for causal structure learning to better capture the relations between data sample distributions and DAGs. However, so far no study has experimented with the use of Wasserstein distance in the context of causal structure learning. Our model named DAG-WGAN combines the Wasserstein-based adversarial loss with an acyclicity constraint in an auto-encoder architecture. It simultaneously learns causal structures while improving its data generation capability. We compare the performance of DAG-WGAN with other models that do not involve the Wasserstein metric in order to identify its contribution to causal structure learning. Our model performs better with high cardinality data according to our experiments.

LGMay 17, 2022
Hyper-Learning for Gradient-Based Batch Size Adaptation

Calum Robert MacLellan, Feng Dong

Scheduling the batch size to increase is an effective strategy to control gradient noise when training deep neural networks. Current approaches implement scheduling heuristics that neglect structure within the optimization procedure, limiting their flexibility to the training dynamics and capacity to discern the impact of their adaptations on generalization. We introduce Arbiter as a new hyperparameter optimization algorithm to perform batch size adaptations for learnable scheduling heuristics using gradients from a meta-objective function, which overcomes previous heuristic constraints by enforcing a novel learning process called hyper-learning. With hyper-learning, Arbiter formulates a neural network agent to generate optimal batch size samples for an inner deep network by learning an adaptive heuristic through observing concomitant responses over T inner descent steps. Arbiter avoids unrolled optimization, and does not require hypernetworks to facilitate gradients, making it reasonably cheap, simple to implement, and versatile to different tasks. We demonstrate Arbiter's effectiveness in several illustrative experiments: to act as a stand-alone batch size scheduler; to complement fixed batch size schedules with greater flexibility; and to promote variance reduction during stochastic meta-optimization of the learning rate.

LGApr 5
DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis

Hristo Petkov, Calum MacLellan, Feng Dong

Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.

LGOct 30, 2025
Linear Causal Discovery with Interventional Constraints

Zhigao Guo, Feng Dong

Incorporating causal knowledge and mechanisms is essential for refining causal models and improving downstream tasks such as designing new treatments. In this paper, we introduce a novel concept in causal discovery, termed interventional constraints, which differs fundamentally from interventional data. While interventional data require direct perturbations of variables, interventional constraints encode high-level causal knowledge in the form of inequality constraints on causal effects. For instance, in the Sachs dataset (Sachs et al.\ 2005), Akt has been shown to be activated by PIP3, meaning PIP3 exerts a positive causal effect on Akt. Existing causal discovery methods allow enforcing structural constraints (for example, requiring a causal path from PIP3 to Akt), but they may still produce incorrect causal conclusions such as learning that "PIP3 inhibits Akt". Interventional constraints bridge this gap by explicitly constraining the total causal effect between variable pairs, ensuring learned models respect known causal influences. To formalize interventional constraints, we propose a metric to quantify total causal effects for linear causal models and formulate the problem as a constrained optimization task, solved using a two-stage constrained optimization method. We evaluate our approach on real-world datasets and demonstrate that integrating interventional constraints not only improves model accuracy and ensures consistency with established findings, making models more explainable, but also facilitates the discovery of new causal relationships that would otherwise be costly to identify.

CVAug 13, 2021
Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection

Jinchao Zhu, Xiaoyu Zhang, Xian Fang et al.

The multi-modal salient object detection model based on RGB-D information has better robustness in the real world. However, it remains nontrivial to better adaptively balance effective multi-modal information in the feature fusion phase. In this letter, we propose a novel gated recoding network (GRNet) to evaluate the information validity of the two modes, and balance their influence. Our framework is divided into three phases: perception phase, recoding mixing phase and feature integration phase. First, A perception encoder is adopted to extract multi-level single-modal features, which lays the foundation for multi-modal semantic comparative analysis. Then, a modal-adaptive gate unit (MGU) is proposed to suppress the invalid information and transfer the effective modal features to the recoding mixer and the hybrid branch decoder. The recoding mixer is responsible for recoding and mixing the balanced multi-modal information. Finally, the hybrid branch decoder completes the multi-level feature integration under the guidance of an optional edge guidance stream (OEGS). Experiments and analysis on eight popular benchmarks verify that our framework performs favorably against 9 state-of-art methods.

CVJul 27, 2021
Perception-and-Regulation Network for Salient Object Detection

Jinchao Zhu, Xiaoyu Zhang, Xian Fang et al.

Effective fusion of different types of features is the key to salient object detection. The majority of existing network structure design is based on the subjective experience of scholars and the process of feature fusion does not consider the relationship between the fused features and highest-level features. In this paper, we focus on the feature relationship and propose a novel global attention unit, which we term the "perception- and-regulation" (PR) block, that adaptively regulates the feature fusion process by explicitly modeling interdependencies between features. The perception part uses the structure of fully-connected layers in classification networks to learn the size and shape of objects. The regulation part selectively strengthens and weakens the features to be fused. An imitating eye observation module (IEO) is further employed for improving the global perception ability of the network. The imitation of foveal vision and peripheral vision enables IEO to scrutinize highly detailed objects and to organize the broad spatial scene to better segment objects. Sufficient experiments conducted on SOD datasets demonstrate that the proposed method performs favorably against 22 state-of-the-art methods.

CRFeb 5, 2020
MadDroid: Characterising and Detecting Devious Ad Content for Android Apps

Tianming Liu, Haoyu Wang, Li Li et al.

Advertisement drives the economy of the mobile app ecosystem. As a key component in the mobile ad business model, mobile ad content has been overlooked by the research community, which poses a number of threats, e.g., propagating malware and undesirable contents. To understand the practice of these devious ad behaviors, we perform a large-scale study on the app contents harvested through automated app testing. In this work, we first provide a comprehensive categorization of devious ad contents, including five kinds of behaviors belonging to two categories: \emph{ad loading content} and \emph{ad clicking content}. Then, we propose MadDroid, a framework for automated detection of devious ad contents. MadDroid leverages an automated app testing framework with a sophisticated ad view exploration strategy for effectively collecting ad-related network traffic and subsequently extracting ad contents. We then integrate dedicated approaches into the framework to identify devious ad contents. We have applied MadDroid to 40,000 Android apps and found that roughly 6\% of apps deliver devious ad contents, e.g., distributing malicious apps that cannot be downloaded via traditional app markets. Experiment results indicate that devious ad contents are prevalent, suggesting that our community should invest more effort into the detection and mitigation of devious ads towards building a trustworthy mobile advertising ecosystem.

CRSep 5, 2017
FraudDroid: Automated Ad Fraud Detection for Android Apps

Feng Dong, Haoyu Wang, Li Li et al.

Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (93%) and recall (92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection