CVApr 12, 2023Code
SpectralDiff: A Generative Framework for Hyperspectral Image Classification with Diffusion ModelsNing Chen, Jun Yue, Leyuan Fang et al.
Hyperspectral Image (HSI) classification is an important issue in remote sensing field with extensive applications in earth science. In recent years, a large number of deep learning-based HSI classification methods have been proposed. However, existing methods have limited ability to handle high-dimensional, highly redundant, and complex data, making it challenging to capture the spectral-spatial distributions of data and relationships between samples. To address this issue, we propose a generative framework for HSI classification with diffusion models (SpectralDiff) that effectively mines the distribution information of high-dimensional and highly redundant data by iteratively denoising and explicitly constructing the data generation process, thus better reflecting the relationships between samples. The framework consists of a spectral-spatial diffusion module, and an attention-based classification module. The spectral-spatial diffusion module adopts forward and reverse spectral-spatial diffusion processes to achieve adaptive construction of sample relationships without requiring prior knowledge of graphical structure or neighborhood information. It captures spectral-spatial distribution and contextual information of objects in HSI and mines unsupervised spectral-spatial diffusion features within the reverse diffusion process. Finally, these features are fed into the attention-based classification module for per-pixel classification. The diffusion features can facilitate cross-sample perception via reconstruction distribution, leading to improved classification performance. Experiments on three public HSI datasets demonstrate that the proposed method can achieve better performance than state-of-the-art methods. For the sake of reproducibility, the source code of SpectralDiff will be publicly available at https://github.com/chenning0115/SpectralDiff.
CVMar 28, 2023
Towards Effective Adversarial Textured 3D Meshes on Physical Face RecognitionXiao Yang, Chang Liu, Longlong Xu et al. · microsoft-research, tsinghua
Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.
LGJun 9, 2022
Towards Safe Reinforcement Learning via Constraining Conditional Value-at-RiskChengyang Ying, Xinning Zhou, Hang Su et al. · tsinghua
Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo.
99.1ROApr 13Code
RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated ManipulationShihan Wu, Xuecheng Liu, Shaoxuan Xie et al.
Despite the critical role of bimanual manipulation in endowing robots with human-like dexterity, large-scale and diverse datasets remain scarce due to the significant hardware heterogeneity across bimanual robotic platforms. To bridge this gap, we introduce RoboCOIN, a large-scale multi-embodiment bimanual manipulation dataset comprising over 180,000 demonstrations collected from 15 distinct robotic platforms. Spanning 16 diverse environments-including residential, commercial, and industrial settings-the dataset features 421 bimanual tasks systematically categorized by 39 bimanual collaboration actions and 432 objects. A key innovation of our work is the hierarchical capability pyramid, which provides granular annotations ranging from trajectory-level concepts to segment-level subtasks and frame-level kinematics. Furthermore, we present CoRobot, an efficient data processing pipeline powered by the Robot Trajectory Markup Language (RTML), designed to facilitate quality assessment, automated annotation, and unified multi-embodiment and data management. Extensive experiments demonstrate the effectiveness of RoboCOIN in enhancing the performance of various bimanual manipulation models across a wide spectrum of robotic embodiments. The entire dataset and codebase are fully open-sourced, providing a valuable resource for advancing research in bimanual and multi-embodiment manipulation.
CVJul 16, 2023
Towards Viewpoint-Invariant Visual Recognition via Adversarial TrainingShouwei Ruan, Yinpeng Dong, Hang Su et al.
Visual recognition models are not invariant to viewpoint changes in the 3D world, as different viewing directions can dramatically affect the predictions given the same object. Although many efforts have been devoted to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. As most models process images in the perspective view, it is challenging to impose invariance to 3D viewpoint changes based only on 2D inputs. Motivated by the success of adversarial training in promoting model robustness, we propose Viewpoint-Invariant Adversarial Training (VIAT) to improve viewpoint robustness of common image classifiers. By regarding viewpoint transformation as an attack, VIAT is formulated as a minimax optimization problem, where the inner maximization characterizes diverse adversarial viewpoints by learning a Gaussian mixture distribution based on a new attack GMVFool, while the outer minimization trains a viewpoint-invariant classifier by minimizing the expected loss over the worst-case adversarial viewpoint distributions. To further improve the generalization performance, a distribution sharing strategy is introduced leveraging the transferability of adversarial viewpoints across objects. Experiments validate the effectiveness of VIAT in improving the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool.
CVJul 21, 2023
Improving Viewpoint Robustness for Visual Recognition via Adversarial TrainingShouwei Ruan, Yinpeng Dong, Hang Su et al.
Viewpoint invariance remains challenging for visual recognition in the 3D world, as altering the viewing directions can significantly impact predictions for the same object. While substantial efforts have been dedicated to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. Motivated by the success of adversarial training in enhancing model robustness, we propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers. Regarding viewpoint transformation as an attack, we formulate VIAT as a minimax optimization problem, where the inner maximization characterizes diverse adversarial viewpoints by learning a Gaussian mixture distribution based on the proposed attack method GMVFool. The outer minimization obtains a viewpoint-invariant classifier by minimizing the expected loss over the worst-case viewpoint distributions that can share the same one for different objects within the same category. Based on GMVFool, we contribute a large-scale dataset called ImageNet-V+ to benchmark viewpoint robustness. Experimental results show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool. Furthermore, we propose ViewRS, a certified viewpoint robustness method that provides a certified radius and accuracy to demonstrate the effectiveness of VIAT from the theoretical perspective.
CVSep 6, 2022
High Speed Rotation Estimation with Dynamic Vision SensorsGuangrong Zhao, Yiran Shen, Ning Chen et al.
Rotational speed is one of the important metrics to be measured for calibrating the electric motors in manufacturing, monitoring engine during car repairing, faults detection on electrical appliance and etc. However, existing measurement techniques either require prohibitive hardware (e.g., high-speed camera) or are inconvenient to use in real-world application scenarios. In this paper, we propose, EV-Tach, an event-based tachometer via efficient dynamic vision sensing on mobile devices. EV-Tach is designed as a high-fidelity and convenient tachometer by introducing dynamic vision sensor as a new sensing modality to capture the high-speed rotation precisely under various real-world scenarios. By designing a series of signal processing algorithms bespoke for dynamic vision sensing on mobile devices, EV-Tach is able to extract the rotational speed accurately from the event stream produced by dynamic vision sensing on rotary targets. According to our extensive evaluations, the Relative Mean Absolute Error (RMAE) of EV-Tach is as low as 0.03% which is comparable to the state-of-the-art laser tachometer under fixed measurement mode. Moreover, EV-Tach is robust to subtle movement of user's hand, therefore, can be used as a handheld device, where the laser tachometer fails to produce reasonable results.
LGAug 8, 2022
Learning-Based Client Selection for Federated Learning Services Over Wireless Networks with Constrained Monetary BudgetsZhipeng Cheng, Xuwei Fan, Minghui Liwang et al.
We investigate a data quality-aware dynamic client selection problem for multiple federated learning (FL) services in a wireless network, where each client offers dynamic datasets for the simultaneous training of multiple FL services, and each FL service demander has to pay for the clients under constrained monetary budgets. The problem is formalized as a non-cooperative Markov game over the training rounds. A multi-agent hybrid deep reinforcement learning-based algorithm is proposed to optimize the joint client selection and payment actions, while avoiding action conflicts. Simulation results indicate that our proposed algorithm can significantly improve training performance.
LGJun 4, 2022
Distributed Machine Learning in D2D-Enabled Heterogeneous Networks: Architectures, Performance, and Open ChallengesZhipeng Cheng, Xuwei Fan, Minghui Liwang et al.
The ever-growing concerns regarding data privacy have led to a paradigm shift in machine learning (ML) architectures from centralized to distributed approaches, giving rise to federated learning (FL) and split learning (SL) as the two predominant privacy-preserving ML mechanisms. However,implementing FL or SL in device-to-device (D2D)-enabled heterogeneous networks with diverse clients presents substantial challenges, including architecture scalability and prolonged training delays. To address these challenges, this article introduces two innovative hybrid distributed ML architectures, namely, hybrid split FL (HSFL) and hybrid federated SL (HFSL). Such architectures combine the strengths of both FL and SL in D2D-enabled heterogeneous wireless networks. We provide a comprehensive analysis of the performance and advantages of HSFL and HFSL, while also highlighting open challenges for future exploration. We support our proposals with preliminary simulations using three datasets in non-independent and non-identically distributed settings, demonstrating the feasibility of our architectures. Our simulations reveal notable reductions in communication/computation costs and training delays as compared to conventional FL and SL.
78.2CVApr 29Code
High-Dimensional Noise to Low-Dimensional Manifolds: A Manifold-Space Diffusion Framework for Degraded Hyperspectral Image ClassificationBoxiang Yang, Ning Chen, Xia Yue et al.
Recently, Hyperspectral Image (HSI) classification has attracted increasing attention in remote sensing. However, HSI data are inherently high-dimensional but low-rank, with discriminative information concentrated on a low-dimensional latent manifold. In real-world remote sensing scenarios, the superposition of multiple degradation factors disrupts this intrinsic manifold structure, driving samples away from their original low-dimensional distribution and introducing substantial redundant and non-discriminative variations. To better handle this challenge, this paper proposes a manifold-space diffusion framework (MSDiff) for robust hyperspectral classification under complex degradation conditions. Specifically, the proposed method first maps high-dimensional, degradation-affected HSI data into a compact low-dimensional manifold through a discriminative spectral-spatial reconstruction task, preserving class semantics and reducing redundant variations. A diffusion-based generative model is then applied to regularize the spectral-spatial distribution within the manifold, enabling progressive refinement and stabilization of latent features against residual degradations. The key advantage of the proposed framework lies in performing diffusion-based distribution modeling directly on the low-dimensional manifold, effectively decoupling degradation-induced disturbances from intrinsic discriminative structures and enhancing representation stability under complex degradations. Experimental results on multiple hyperspectral benchmarks demonstrate consistent performance improvements over state-of-the-art methods under diverse composite degradation settings. The code will be available at https://github.com/yangboxiang1207/MSDiff
SDJun 21, 2023
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech AugmentationZhonghua Liu, Shijun Wang, Ning Chen
Voice Conversion (VC) converts the voice of a source speech to that of a target while maintaining the source's content. Speech can be mainly decomposed into four components: content, timbre, rhythm and pitch. Unfortunately, most related works only take into account content and timbre, which results in less natural speech. Some recent works are able to disentangle speech into several components, but they require laborious bottleneck tuning or various hand-crafted features, each assumed to contain disentangled speech information. In this paper, we propose a VC model that can automatically disentangle speech into four components using only two augmentation functions, without the requirement of multiple hand-crafted features or laborious bottleneck tuning. The proposed model is straightforward yet efficient, and the empirical results demonstrate that our model can achieve a better performance than the baseline, regarding disentanglement effectiveness and speech naturalness.
CVJan 19Code
FGTBT: Frequency-Guided Task-Balancing Transformer for Unified Facial Landmark DetectionJun Wan, Xinyu Xiong, Ning Chen et al.
Recently, deep learning based facial landmark detection (FLD) methods have achieved considerable success. However, in challenging scenarios such as large pose variations, illumination changes, and facial expression variations, they still struggle to accurately capture the geometric structure of the face, resulting in performance degradation. Moreover, the limited size and diversity of existing FLD datasets hinder robust model training, leading to reduced detection accuracy. To address these challenges, we propose a Frequency-Guided Task-Balancing Transformer (FGTBT), which enhances facial structure perception through frequency-domain modeling and multi-dataset unified training. Specifically, we propose a novel Fine-Grained Multi-Task Balancing loss (FMB-loss), which moves beyond coarse task-level balancing by assigning weights to individual landmarks based on their occurrence across datasets. This enables more effective unified training and mitigates the issue of inconsistent gradient magnitudes. Additionally, a Frequency-Guided Structure-Aware (FGSA) model is designed to utilize frequency-guided structure injection and regularization to help learn facial structure constraints. Extensive experimental results on popular benchmark datasets demonstrate that the integration of the proposed FMB-loss and FGSA model into our FGTBT framework achieves performance comparable to state-of-the-art methods. The code is available at https://github.com/Xi0ngxinyu/FGTBT.
SENov 8, 2013Code
Mining Crash Fix PatternsJaechang Nam, Ning Chen
During the life cycle of software development, developers have to fix different kinds of bugs reported by testers or end users. The efficiency and effectiveness of fixing bugs have a huge impact on the reliability of the software as well as the productivity of the development team. Software companies usually spend a large amount of money and human resources on the testing and bug fixing departments. As a result, a better and more reliable way to fix bugs is highly desired by them. In order to achieve such goal, in depth studies on the characteristics of bug fixes from well maintained, highly popular software projects are necessary. In this paper, we study the bug fixing histories extracted from the Eclipse project, a well maintained, highly popular open source project. After analyzing more than 36,000 bugs that belongs to three major kinds of exception types, we are able to reveal some common fix types that are frequently used to fix certain kinds of program exceptions. Our analysis shows that almost all of the exceptions that belong to a certain exception can be fixed by less than ten fix types. Our result implies that most of the bugs in software projects can be and should be fixed by only a few common fix patterns.
LGFeb 6
Enhance and Reuse: A Dual-Mechanism Approach to Boost Deep Forest for Label Distribution LearningJia-Le Xu, Shen-Huan Lyu, Yu-Nian Wang et al.
Label distribution learning (LDL) requires the learner to predict the degree of correlation between each sample and each label. To achieve this, a crucial task during learning is to leverage the correlation among labels. Deep Forest (DF) is a deep learning framework based on tree ensembles, whose training phase does not rely on backpropagation. DF performs in-model feature transform using the prediction of each layer and achieves competitive performance on many tasks. However, its exploration in the field of LDL is still in its infancy. The few existing methods that apply DF to the field of LDL do not have effective ways to utilize the correlation among labels. Therefore, we propose a method named Enhanced and Reused Feature Deep Forest (ERDF). It mainly contains two mechanisms: feature enhancement exploiting label correlation and measure-aware feature reuse. The first one is to utilize the correlation among labels to enhance the original features, enabling the samples to acquire more comprehensive information for the task of LDL. The second one performs a reuse operation on the features of samples that perform worse than the previous layer on the validation set, in order to ensure the stability of the training process. This kind of Enhance-Reuse pattern not only enables samples to enrich their features but also validates the effectiveness of their new features and conducts a reuse process to prevent the noise from spreading further. Experiments show that our method outperforms other comparison algorithms on six evaluation metrics.
NEMar 8
CDEoH: Category-Driven Automatic Algorithm Design With Large Language ModelsYu-Nian Wang, Shen-Huan Lyu, Ning Chen et al.
With the rapid advancement of large language models (LLMs), LLM-based heuristic search methods have demonstrated strong capabilities in automated algorithm generation. However, their evolutionary processes often suffer from instability and premature convergence. Existing approaches mainly address this issue through prompt engineering or by jointly evolving thought and code, while largely overlooking the critical role of algorithmic category diversity in maintaining evolutionary stability. To this end, we propose Category Driven Automatic Algorithm Design with Large Language Models (CDEoH), which explicitly models algorithm categories and jointly balances performance and category diversity in population management, enabling parallel exploration across multiple algorithmic paradigms. Extensive experiments on representative combinatorial optimization problems across multiple scales demonstrate that CDEoH effectively mitigates convergence toward a single evolutionary direction, significantly enhancing evolutionary stability and achieving consistently superior average performance across tasks and scales.
LGMar 3
Breaking the Prototype Bias Loop: Confidence-Aware Federated Contrastive Learning for Highly Imbalanced ClientsTian-Shuang Wu, Shen-Huan Lyu, Ning Chen et al.
Local class imbalance and data heterogeneity across clients often trap prototype-based federated contrastive learning in a prototype bias loop: biased local prototypes induced by imbalanced data are aggregated into biased global prototypes, which are repeatedly reused as contrastive anchors, accumulating errors across communication rounds. To break this loop, we propose Confidence-Aware Federated Contrastive Learning (CAFedCL), a novel framework that improves the prototype aggregation mechanism and strengthens the contrastive alignment guided by prototypes. CAFedCL employs a confidence-aware aggregation mechanism that leverages predictive uncertainty to downweight high-variance local prototypes. In addition, generative augmentation for minority classes and geometric consistency regularization are integrated to stabilize the structure between classes. From a theoretical perspective, we provide an expectation-based analysis showing that our aggregation reduces estimation variance, thereby bounding global prototype drift and ensuring convergence. Extensive experiments under varying levels of class imbalance and data heterogeneity demonstrate that CAFedCL consistently outperforms representative federated baselines in both accuracy and client fairness.
LGJun 20, 2022
A Comparative Study on Application of Class-Imbalance Learning for Severity Prediction of Adverse Events Following ImmunizationNing Chen, Zhengke Sun, Tong Jia
In collaboration with the Liaoning CDC, China, we propose a prediction system to predict the subsequent hospitalization of children with adverse reactions based on data on adverse events following immunization. We extracted multiple features from the data, and selected "hospitalization or not" as the target for classification. Since the data are imbalanced, we used various class-imbalance learning methods for training and improved the RUSBoost algorithm. Experimental results show that the improved RUSBoost has the highest Area Under the ROC Curve on the target among these algorithms. Additionally, we compared these class-imbalance learning methods with some common machine learning algorithms. We combined the improved RUSBoost with dynamic web resource development techniques to build an evaluation system with information entry and vaccination response prediction capabilities for relevant medical practitioners.
60.5CVApr 9
MARINER: A 3E-Driven Benchmark for Fine-Grained Perception and Complex Reasoning in Open-Water EnvironmentsXingming Liao, Ning Chen, Muying Shu et al.
Fine-grained visual understanding and high-level reasoning in real-world open-water environments remain under-explored due to the lack of dedicated benchmarks. We introduce MARINER, a comprehensive benchmark built under the novel Entity-Environment-Event (3E) paradigm. MARINER contains 16,629 multi-source maritime images with 63 fine-grained vessel categories, diverse adverse environments, and 5 typical dynamic maritime incidents, covering fine-grained classification, object detection, and visual question answering tasks. We conduct extensive evaluations on mainstream Multimodal Large language models (MLLMs) and establish baselines, revealing that even advanced models struggle with fine-grained discrimination and causal reasoning in complex marine scenes. As a dedicated maritime benchmark, MARINER fills the gap of realistic and cognitive-level evaluation for maritime multimodal understanding, and promotes future research on robust vision-language models for open-water applications. Appendix and supplementary materials are available at https://lxixim.github.io/MARINER.
ROFeb 12, 2025
CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-WorldYankai Fu, Qiuxuan Feng, Ning Chen et al.
Achieving human-level dexterity in robots is a key objective in the field of robotic manipulation. Recent advancements in 3D-based imitation learning have shown promising results, providing an effective pathway to achieve this goal. However, obtaining high-quality 3D representations presents two key problems: (1) the quality of point clouds captured by a single-view camera is significantly affected by factors such as camera resolution, positioning, and occlusions caused by the dexterous hand; (2) the global point clouds lack crucial contact information and spatial correspondences, which are necessary for fine-grained dexterous manipulation tasks. To eliminate these limitations, we propose CordViP, a novel framework that constructs and learns correspondences by leveraging the robust 6D pose estimation of objects and robot proprioception. Specifically, we first introduce the interaction-aware point clouds, which establish correspondences between the object and the hand. These point clouds are then used for our pre-training policy, where we also incorporate object-centric contact maps and hand-arm coordination information, effectively capturing both spatial and temporal dynamics. Our method demonstrates exceptional dexterous manipulation capabilities, achieving state-of-the-art performance in six real-world tasks, surpassing other baselines by a large margin. Experimental results also highlight the superior generalization and robustness of CordViP to different objects, viewpoints, and scenarios. Code and videos are available on https://aureleopku.github.io/CordViP.
DCJan 5, 2024
Towards Integrated Fine-tuning and Inference when Generative AI meets Edge IntelligenceNing Chen, Zhipeng Cheng, Xuwei Fan et al.
The high-performance generative artificial intelligence (GAI) represents the latest evolution of computational intelligence, while the blessing of future 6G networks also makes edge intelligence (EI) full of development potential. The inevitable encounter between GAI and EI can unleash new opportunities, where GAI's pre-training based on massive computing resources and large-scale unlabeled corpora can provide strong foundational knowledge for EI, while EI can harness fragmented computing resources to aggregate personalized knowledge for GAI. However, the natural contradictory features pose significant challenges to direct knowledge sharing. To address this, in this paper, we propose the GAI-oriented synthetical network (GaisNet), a collaborative cloud-edge-end intelligence framework that buffers contradiction leveraging data-free knowledge relay, where the bidirectional knowledge flow enables GAI's virtuous-cycle model fine-tuning and task inference, achieving mutualism between GAI and EI with seamless fusion and collaborative evolution. Experimental results demonstrate the effectiveness of the proposed mechanisms. Finally, we discuss the future challenges and directions in the interplay between GAI and EI.
CVDec 17, 2024
RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image ClassificationGuangwenjie Zou, Liang Yao, Fan Liu et al.
Since high resolution remote sensing image classification often requires a relatively high computation complexity, lightweight models tend to be practical and efficient. Model pruning is an effective method for model compression. However, existing methods rarely take into account the specificity of remote sensing images, resulting in significant accuracy loss after pruning. To this end, we propose an effective structural pruning approach for remote sensing image classification. Specifically, a pruning strategy that amplifies the differences in channel importance of the model is introduced. Then an adaptive mining loss function is designed for the fine-tuning process of the pruned model. Finally, we conducted experiments on two remote sensing classification datasets. The experimental results demonstrate that our method achieves minimal accuracy loss after compressing remote sensing classification models, achieving state-of-the-art (SoTA) performance.
IRMay 24, 2025
Improving Ad matching via Cluster-Adaptive Keyword Expansion and Relevance tuningDipanwita Saha, Anis Zaman, Hua Zou et al.
In search advertising, keyword matching connects user queries with relevant ads. While token-based matching increases ad coverage, it can reduce relevance due to overly permissive semantic expansion. This work extends keyword reach through document-side semantic keyword expansion, using a language model to broaden token-level matching without altering queries. We propose a solution using a pre-trained siamese model to generate dense vector representations of ad keywords and identify semantically related variants through nearest neighbor search. To maintain precision, we introduce a cluster-based thresholding mechanism that adjusts similarity cutoffs based on local semantic density. Each expanded keyword maps to a group of seller-listed items, which may only partially align with the original intent. To ensure relevance, we enhance the downstream relevance model by adapting it to the expanded keyword space using an incremental learning strategy with a lightweight decision tree ensemble. This system improves both relevance and click-through rate (CTR), offering a scalable, low-latency solution adaptable to evolving query behavior and advertising inventory.
LGJan 31, 2025
Improving Multi-Label Contrastive Learning by Leveraging Label DistributionNing Chen, Shen-Huan Lyu, Tian-Shuang Wu et al.
In multi-label learning, leveraging contrastive learning to learn better representations faces a key challenge: selecting positive and negative samples and effectively utilizing label information. Previous studies selected positive and negative samples based on the overlap between labels and used them for label-wise loss balancing. However, these methods suffer from a complex selection process and fail to account for the varying importance of different labels. To address these problems, we propose a novel method that improves multi-label contrastive learning through label distribution. Specifically, when selecting positive and negative samples, we only need to consider whether there is an intersection between labels. To model the relationships between labels, we introduce two methods to recover label distributions from logical labels, based on Radial Basis Function (RBF) and contrastive loss, respectively. We evaluate our method on nine widely used multi-label datasets, including image and vector datasets. The results demonstrate that our method outperforms state-of-the-art methods in six evaluation metrics.
RONov 21, 2025
METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action ModelYankai Fu, Ning Chen, Junkai Zhao et al.
Building a generalist robot that can perceive, reason, and act across diverse tasks remains an open challenge, especially for dexterous manipulation. A major bottleneck lies in the scarcity of large-scale, action-annotated data for dexterous skills, as teleoperation is difficult and costly. Human data, with its vast scale and diverse manipulation behaviors, provides rich priors for learning robotic actions. While prior works have explored leveraging human demonstrations, they are often constrained by limited scenarios and a large visual gap between human and robots. To eliminate these limitations, we propose METIS, a vision-language-action (VLA) model for dexterous manipulation pretrained on multi-source egocentric datasets. We first construct EgoAtlas, which integrates large-scale human and robotic data from multiple sources, all unified under a consistent action space. We further extract motion-aware dynamics, a compact and discretized motion representation, which provides efficient and expressive supervision for VLA training. Built upon them, METIS integrates reasoning and acting into a unified framework, enabling effective deployment to downstream dexterous manipulation tasks. Our method demonstrates exceptional dexterous manipulation capabilities, achieving highest average success rate in six real-world tasks. Experimental results also highlight the superior generalization and robustness to out-of-distribution scenarios. These findings emphasize METIS as a promising step toward a generalist model for dexterous manipulation.
CVOct 27, 2025
Survey of Multimodal Geospatial Foundation Models: Techniques, Applications, and ChallengesLiling Yang, Ning Chen, Jun Yue et al.
Foundation models have transformed natural language processing and computer vision, and their impact is now reshaping remote sensing image analysis. With powerful generalization and transfer learning capabilities, they align naturally with the multimodal, multi-resolution, and multi-temporal characteristics of remote sensing data. To address unique challenges in the field, multimodal geospatial foundation models (GFMs) have emerged as a dedicated research frontier. This survey delivers a comprehensive review of multimodal GFMs from a modality-driven perspective, covering five core visual and vision-language modalities. We examine how differences in imaging physics and data representation shape interaction design, and we analyze key techniques for alignment, integration, and knowledge transfer to tackle modality heterogeneity, distribution shifts, and semantic gaps. Advances in training paradigms, architectures, and task-specific adaptation strategies are systematically assessed alongside a wealth of emerging benchmarks. Representative multimodal visual and vision-language GFMs are evaluated across ten downstream tasks, with insights into their architectures, performance, and application scenarios. Real-world case studies, spanning land cover mapping, agricultural monitoring, disaster response, climate studies, and geospatial intelligence, demonstrate the practical potential of GFMs. Finally, we outline pressing challenges in domain generalization, interpretability, efficiency, and privacy, and chart promising avenues for future research.
CVSep 10, 2025
HyperTTA: Test-Time Adaptation for Hyperspectral Image Classification under Distribution ShiftsXia Yue, Anfeng Liu, Ning Chen et al.
Hyperspectral image (HSI) classification models are highly sensitive to distribution shifts caused by real-world degradations such as noise, blur, compression, and atmospheric effects. To address this challenge, we propose HyperTTA (Test-Time Adaptable Transformer for Hyperspectral Degradation), a unified framework that enhances model robustness under diverse degradation conditions. First, we construct a multi-degradation hyperspectral benchmark that systematically simulates nine representative degradations, enabling comprehensive evaluation of robust classification. Based on this benchmark, we develop a Spectral--Spatial Transformer Classifier (SSTC) with a multi-level receptive field mechanism and label smoothing regularization to capture multi-scale spatial context and improve generalization. Furthermore, we introduce a lightweight test-time adaptation strategy, the Confidence-aware Entropy-minimized LayerNorm Adapter (CELA), which dynamically updates only the affine parameters of LayerNorm layers by minimizing prediction entropy on high-confidence unlabeled target samples. This strategy ensures reliable adaptation without access to source data or target labels. Experiments on two benchmark datasets demonstrate that HyperTTA outperforms state-of-the-art baselines across a wide range of degradation scenarios. Code will be made available publicly.
CVJul 6, 2025
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided OptimizationZuyu Zhang, Ning Chen, Yongshan Liu et al.
Single Domain Generalization (SDG) aims to develop models capable of generalizing to unseen target domains using only one source domain, a task complicated by substantial domain shifts and limited data diversity. Existing SDG approaches primarily rely on data augmentation techniques, which struggle to effectively adapt training dynamics to accommodate large domain shifts. To address this, we propose LEAwareSGD, a novel Lyapunov Exponent (LE)-guided optimization approach inspired by dynamical systems theory. By leveraging LE measurements to modulate the learning rate, LEAwareSGD encourages model training near the edge of chaos, a critical state that optimally balances stability and adaptability. This dynamic adjustment allows the model to explore a wider parameter space and capture more generalizable features, ultimately enhancing the model's generalization capability. Extensive experiments on PACS, OfficeHome, and DomainNet demonstrate that LEAwareSGD yields substantial generalization gains, achieving up to 9.47\% improvement on PACS in low-data regimes. These results underscore the effectiveness of training near the edge of chaos for enhancing model generalization capability in SDG tasks.
CVApr 10, 2025
Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-ReconstructionQingchao Jiang, Zhishuo Xu, Zhiying Zhu et al.
Advances in image generation enable hyper-realistic synthetic faces but also pose risks, thus making synthetic face detection crucial. Previous research focuses on the general differences between generated images and real images, often overlooking the discrepancies among various generative techniques. In this paper, we explore the intrinsic relationship between synthetic images and their corresponding generation technologies. We find that specific images exhibit significant reconstruction discrepancies across different generative methods and that matching generation techniques provide more accurate reconstructions. Based on this insight, we propose a Multi-Reconstruction-based detector. By reversing and reconstructing images using multiple generative models, we analyze the reconstruction differences among real, GAN-generated, and DM-generated images to facilitate effective differentiation. Additionally, we introduce the Asian Synthetic Face Dataset (ASFD), containing synthetic Asian faces generated with various GANs and DMs. This dataset complements existing synthetic face datasets. Experimental results demonstrate that our detector achieves exceptional performance, with strong generalization and robustness.
LGFeb 22, 2025
Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference ServicesZhipeng Cheng, Xiaoyu Xia, Hong Wang et al.
Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To tackle these issues, this paper presents a novel privacy-aware optimization framework that jointly addresses DNN model deployment, user-server association, and model partitioning, with the goal of minimizing long-term average inference delay under resource and privacy constraints. The problem is formulated as a complex, NP-hard stochastic optimization. To efficiently handle system dynamics and computational complexity, we employ a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Furthermore, we introduce a coalition formation game to enable adaptive user-server association and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay and consistently satisfies privacy constraints, outperforming state-of-the-art baselines across diverse scenarios.
LGFeb 9, 2025
Compressing Model with Few Class-Imbalance Samples: An Out-of-Distribution ExpeditionTian-Shuang Wu, Shen-Huan Lyu, Ning Chen et al.
In recent years, as a compromise between privacy and performance, few-sample model compression has been widely adopted to deal with limited data resulting from privacy and security concerns. However, when the number of available samples is extremely limited, class imbalance becomes a common and tricky problem. Achieving an equal number of samples across all classes is often costly and impractical in real-world applications, and previous studies on few-sample model compression have mostly ignored this significant issue. Our experiments comprehensively demonstrate that class imbalance negatively affects the overall performance of few-sample model compression methods. To address this problem, we propose a novel and adaptive framework named OOD-Enhanced Few-Sample Model Compression (OE-FSMC). This framework integrates easily accessible out-of-distribution (OOD) data into both the compression and fine-tuning processes, effectively rebalancing the training distribution. We also incorporate a joint distillation loss and a regularization term to reduce the risk of the model overfitting to the OOD data. Extensive experiments on multiple benchmark datasets show that our framework can be seamlessly incorporated into existing few-sample model compression methods, effectively mitigating the accuracy degradation caused by class imbalance.
LGOct 20, 2021
One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast CancerYongquan Yang, Fengling Li, Yani Wei et al.
Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning, including data-driven logical reasoning, knowledge driven machine learning and abductive learning, in inventing advanced technologies for different artificial intelligence applications. One-step abductive multi-target learning (OSAMTL), an approach inspired by abductive learning, via simply combining machine learning and logical reasoning in a one-step balanced multi-target learning way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions. We released a model pre-trained with OSAMTL-DiNS for tumour segmentation in HE-stained pre-treatment biopsy images in breast cancer, which has been successfully applied as a pre-processing tool to extract tumour-associated stroma compartment for predicting the pathological complete response to neoadjuvant chemotherapy in breast cancer.
LGMay 18, 2021
StackVAE-G: An efficient and interpretable model for time series anomaly detectionWenkai Li, Wenbo Hu, Ting Chen et al.
Recent studies have shown that autoencoder-based models can achieve superior performance on anomaly detection tasks due to their excellent ability to fit complex data in an unsupervised manner. In this work, we propose a novel autoencoder-based model, named StackVAE-G that can significantly bring the efficiency and interpretability to multivariate time series anomaly detection. Specifically, we utilize the similarities across the time series channels by the stacking block-wise reconstruction with a weight-sharing scheme to reduce the size of learned models and also relieve the overfitting to unknown noises in the training data. We also leverage a graph learning module to learn a sparse adjacency matrix to explicitly capture the stable interrelation structure among multiple time series channels for the interpretable pattern reconstruction of interrelated channels. Combining these two modules, we introduce the stacking block-wise VAE (variational autoencoder) with GNN (graph neural network) model for multivariate time series anomaly detection. We conduct extensive experiments on three commonly used public datasets, showing that our model achieves comparable (even better) performance with the state-of-the-art modelsand meanwhile requires much less computation and memory cost. Furthermore, we demonstrate that the adjacency matrix learned by our model accurately captures the interrelation among multiple channels, and can provide valuable information for failure diagnosis applications.
LGJan 21, 2021
A Survey on Ensemble Learning under the Era of Deep LearningYongquan Yang, Haijun Lv, Ning Chen
Due to the dominant position of deep learning (mostly deep neural networks) in various artificial intelligence applications, recently, ensemble learning based on deep neural networks (ensemble deep learning) has shown significant performances in improving the generalization of learning system. However, since modern deep neural networks usually have millions to billions of parameters, the time and space overheads for training multiple base deep learners and testing with the ensemble deep learner are far greater than that of traditional ensemble learning. Though several algorithms of fast ensemble deep learning have been proposed to promote the deployment of ensemble deep learning in some applications, further advances still need to be made for many applications in specific fields, where the developing time and computing resources are usually restricted or the data to be processed is of large dimensionality. An urgent problem needs to be solved is how to take the significant advantages of ensemble deep learning while reduce the required expenses so that many more applications in specific fields can benefit from it. For the alleviation of this problem, it is essential to know about how ensemble learning has developed under the era of deep learning. Thus, in this article, we present fundamental discussions focusing on data analyses of published works, methodologies, recent advances and unattainability of traditional ensemble learning and ensemble deep learning. We hope this article will be helpful to realize the intrinsic problems and technical challenges faced by future developments of ensemble learning under the era of deep learning.
HCOct 31, 2020
Visual Companion for BookloversZona Kostic, Jared Jessup, Jeffrey Baglioni et al.
An innumerable number of individual choices go into discovering a new book. There are unmistakably two groups of booklovers: those who like to search online, follow other people's latest readings, or simply react to a system's recommendations; and those who love to wander between library stacks, lose themselves behind bookstore shelves, or simply hide behind piles of (un)organized books. Depending on which group a person may fall into, there are two distinct and corresponding mediums that inform his or her choices: digital, that provides efficient retrieval of information online, and physical, a more tactile pursuit that leads to unexpected discoveries and promotes serendipity. How could we possibly bridge the gap between these seemingly disparate mediums into an integrated system that can amplify the benefits they both offer? In this paper, we present the BookVIS application, which uses book-related data and generates personalized visualizations to follow users in their quest for a new book. In this new redesigned version, the app brings associative visual connections to support intuitive exploration of easily retrieved digital information and its relationship with the physical book in hand. BookVIS keeps track of the user's reading preferences and generates a dataSelfie as an individual snapshot of a personal taste that grows over time. Usability testing has also been conducted and has demonstrated the app's ability to identify distinguishable patterns in readers' tastes that could be further used to communicate personal preferences in new "shelf-browsing" iterations. By efficiently supplementing the user's cognitive information needs while still supporting the spontaneity and enjoyment of the book browsing experience, BookVIS bridges the gap between real and online realms, and maximizes the engagement of personalized mobile visual clues.
CVDec 25, 2019
Learn to Segment Retinal Lesions and BeyondQijie Wei, Xirong Li, Weihong Yu et al.
Towards automated retinal screening, this paper makes an endeavor to simultaneously achieve pixel-level retinal lesion segmentation and image-level disease classification. Such a multi-task approach is crucial for accurate and clinically interpretable disease diagnosis. Prior art is insufficient due to three challenges, i.e., lesions lacking objective boundaries, clinical importance of lesions irrelevant to their size, and the lack of one-to-one correspondence between lesion and disease classes. This paper attacks the three challenges in the context of diabetic retinopathy (DR) grading. We propose Lesion-Net, a new variant of fully convolutional networks, with its expansive path re-designed to tackle the first challenge. A dual Dice loss that leverages both semantic segmentation and image classification losses is introduced to resolve the second challenge. Lastly, we build a multi-task network that employs Lesion-Net as a side-attention branch for both DR grading and result interpretation. A set of 12K fundus images is manually segmented by 45 ophthalmologists for 8 DR-related lesions, resulting in 290K manual segments in total. Extensive experiments on this large-scale dataset show that our proposed approach surpasses the prior art for multiple tasks including lesion segmentation, lesion classification and DR grading
LGMay 25, 2019
Rethinking Softmax Cross-Entropy Loss for Adversarial RobustnessTianyu Pang, Kun Xu, Yinpeng Dong et al.
Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data could be costly, we focus on better utilizing the given data by inducing the regions with high sample density in the feature space, which could lead to locally sufficient samples for robust learning. We first formally show that the softmax cross-entropy (SCE) loss and its variants convey inappropriate supervisory signals, which encourage the learned feature points to spread over the space sparsely in training. This inspires us to propose the Max-Mahalanobis center (MMC) loss to explicitly induce dense feature regions in order to benefit robustness. Namely, the MMC loss encourages the model to concentrate on learning ordered and compact representations, which gather around the preset optimal centers for different classes. We empirically demonstrate that applying the MMC loss can significantly improve robustness even under strong adaptive attacks, while keeping state-of-the-art accuracy on clean inputs with little extra computation compared to the SCE loss.
LGJan 25, 2019
Improving Adversarial Robustness via Promoting Ensemble DiversityTianyu Pang, Kun Xu, Chao Du et al.
Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks. Many efforts have been devoted to enhancing the robustness of individual networks and then constructing a straightforward ensemble, e.g., by directly averaging the outputs, which ignores the interaction among networks. This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Technically, we define a new notion of ensemble diversity in the adversarial setting as the diversity among non-maximal predictions of individual members, and present an adaptive diversity promoting (ADP) regularizer to encourage the diversity, which leads to globally better robustness for the ensemble by making adversarial examples difficult to transfer among individual members. Our method is computationally efficient and compatible with the defense methods acting on individual networks. Empirical results on various datasets verify that our method can improve adversarial robustness while maintaining state-of-the-art accuracy on normal examples.
CVNov 30, 2018
Transferable Adversarial Attacks for Image and Video Object DetectionXingxing Wei, Siyuan Liang, Ning Chen et al.
Adversarial examples have been demonstrated to threaten many computer vision tasks including object detection. However, the existing attacking methods for object detection have two limitations: poor transferability, which denotes that the generated adversarial examples have low success rate to attack other kinds of detection methods, and high computation cost, which means that they need more time to generate an adversarial image, and therefore are difficult to deal with the video data. To address these issues, we utilize a generative mechanism to obtain the adversarial image and video. In this way, the processing time is reduced. To enhance the transferability, we destroy the feature maps extracted from the feature network, which usually constitutes the basis of object detectors. The proposed method is based on the Generative Adversarial Network (GAN) framework, where we combine the high-level class loss and low-level feature loss to jointly train the adversarial example generator. A series of experiments conducted on PASCAL VOC and ImageNet VID datasets show that our method can efficiently generate image and video adversarial examples, and more importantly, these adversarial examples have better transferability, and thus, are able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN, and regression based models like SSD.
MLNov 13, 2017
Message Passing Stein Variational Gradient DescentJingwei Zhuo, Chang Liu, Jiaxin Shi et al.
Stein variational gradient descent (SVGD) is a recently proposed particle-based Bayesian inference method, which has attracted a lot of interest due to its remarkable approximation ability and particle efficiency compared to traditional variational inference and Markov Chain Monte Carlo methods. However, we observed that particles of SVGD tend to collapse to modes of the target distribution, and this particle degeneracy phenomenon becomes more severe with higher dimensions. Our theoretical analysis finds out that there exists a negative correlation between the dimensionality and the repulsive force of SVGD which should be blamed for this phenomenon. We propose Message Passing SVGD (MP-SVGD) to solve this problem. By leveraging the conditional independence structure of probabilistic graphical models (PGMs), MP-SVGD converts the original high-dimensional global inference problem into a set of local ones over the Markov blanket with lower dimensions. Experimental results show its advantages of preventing vanishing repulsive force in high-dimensional space over SVGD, and its particle efficiency and approximation flexibility over other inference methods on graphical models.
CRSep 23, 2017
A Grassmannian Approach to Zero-Shot Learning for Network Intrusion DetectionJorge Rivero, Bernardete Ribeiro, Ning Chen et al.
One of the main problems in Network Intrusion Detection comes from constant rise of new attacks, so that not enough labeled examples are available for the new classes of attacks. Traditional Machine Learning approaches hardly address such problem. This can be overcome with Zero-Shot Learning, a new approach in the field of Computer Vision, which can be described in two stages: the Attribute Learning and the Inference Stage. The goal of this paper is to propose a new Inference Stage algorithm for Network Intrusion Detection. In order to attain this objective, we firstly put forward an experimental setup for the evaluation of the Zero-Shot Learning in Network Intrusion Detection related tasks. Secondly, a decision tree based algorithm is applied to extract rules for generating the attributes in the AL stage. Finally, using a representation of a Zero-Shot Class as a point in the Grassmann manifold, an explicit formula for the shortest distance between points in that manifold can be used to compute the geodesic distance between the Zero-Shot Classes which represent the new attacks and the Known Classes corresponding to the attack categories. The experimental results in the datasets KDD Cup 99 and NSL-KDD show that our approach with Zero-Shot Learning successfully addresses the Network Intrusion Detection problem.
CLJul 1, 2017
SAM: Semantic Attribute Modulation for Language Modeling and Style VariationWenbo Hu, Lifeng Hua, Lei Li et al.
This paper presents a Semantic Attribute Modulation (SAM) for language modeling and style variation. The semantic attribute modulation includes various document attributes, such as titles, authors, and document categories. We consider two types of attributes, (title attributes and category attributes), and a flexible attribute selection scheme by automatically scoring them via an attribute attention mechanism. The semantic attributes are embedded into the hidden semantic space as the generation inputs. With the attributes properly harnessed, our proposed SAM can generate interpretable texts with regard to the input attributes. Qualitative analysis, including word semantic analysis and attention values, shows the interpretability of SAM. On several typical text datasets, we empirically demonstrate the superiority of the Semantic Attribute Modulated language model with different combinations of document attributes. Moreover, we present a style variation for the lyric generation using SAM, which shows a strong connection between the style variation and the semantic attributes.
LGDec 7, 2015
Discriminative Nonparametric Latent Feature Relational Models with Data AugmentationBei Chen, Ning Chen, Jun Zhu et al.
We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.
LGAug 10, 2015
Dropout Training for SVMs with Data AugmentationNing Chen, Jun Zhu, Jianfei Chen et al.
Dropout and other feature noising schemes have shown promising results in controlling over-fitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re-weighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we consider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected non-smooth hinge loss and the nonlinearity of latent representations. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs. In addition, the nonlinear SVMs further improve the prediction performance on several image datasets.
LGApr 16, 2014
Dropout Training for Support Vector MachinesNing Chen, Jun Zhu, Jianfei Chen et al.
Dropout and other feature noising schemes have shown promising results in controlling over-fitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for linear SVMs. To deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re-weighted least square problem, where the re-weights have closed-form solutions. The similar ideas are applied to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of linear SVMs.
MLOct 10, 2013
Gibbs Max-margin Topic Models with Data AugmentationJun Zhu, Ning Chen, Hugh Perkins et al.
Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.
LGOct 9, 2013
Discriminative Relational Topic ModelsNing Chen, Jun Zhu, Fei Xia et al.
Many scientific and engineering fields involve analyzing network data. For document networks, relational topic models (RTMs) provide a probabilistic generative process to describe both the link structure and document contents, and they have shown promise on predicting network structures and discovering latent topic representations. However, existing RTMs have limitations in both the restricted model expressiveness and incapability of dealing with imbalanced network data. To expand the scope and improve the inference accuracy of RTMs, this paper presents three extensions: 1) unlike the common link likelihood with a diagonal weight matrix that allows the-same-topic interactions only, we generalize it to use a full weight matrix that captures all pairwise topic interactions and is applicable to asymmetric networks; 2) instead of doing standard Bayesian inference, we perform regularized Bayesian inference (RegBayes) with a regularization parameter to deal with the imbalanced link structure issue in common real networks and improve the discriminative ability of learned latent representations; and 3) instead of doing variational approximation with strict mean-field assumptions, we present collapsed Gibbs sampling algorithms for the generalized relational topic models by exploring data augmentation without making restricting assumptions. Under the generic RegBayes framework, we carefully investigate two popular discriminative loss functions, namely, the logistic log-loss and the max-margin hinge loss. Experimental results on several real network datasets demonstrate the significance of these extensions on improving the prediction performance, and the time efficiency can be dramatically improved with a simple fast approximation method.
LGOct 5, 2012
Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMsJun Zhu, Ning Chen, Eric P. Xing
Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.
CCMay 6, 2012
On the Complexity of Trial and ErrorXiaohui Bei, Ning Chen, Shengyu Zhang
Motivated by certain applications from physics, biochemistry, economics, and computer science, in which the objects under investigation are not accessible because of various limitations, we propose a trial-and-error model to examine algorithmic issues in such situations. Given a search problem with a hidden input, we are asked to find a valid solution, to find which we can propose candidate solutions (trials), and use observed violations (errors), to prepare future proposals. In accordance with our motivating applications, we consider the fairly broad class of constraint satisfaction problems, and assume that errors are signaled by a verification oracle in the format of the index of a violated constraint (with the content of the constraint still hidden). Our discoveries are summarized as follows. On one hand, despite the seemingly very little information provided by the verification oracle, efficient algorithms do exist for a number of important problems. For the Nash, Core, Stable Matching, and SAT problems, the unknown-input versions are as hard as the corresponding known-input versions, up to a factor of polynomial. We further give almost tight bounds on the latter two problems' trial complexities. On the other hand, there are problems whose complexities are substantially increased in the unknown-input model. In particular, no time-efficient algorithms exist (under standard hardness assumptions) for Graph Isomorphism and Group Isomorphism problems. The tools used to achieve these results include order theory, strong ellipsoid method, and some non-standard reductions. Our model investigates the value of information, and our results demonstrate that the lack of input information can introduce various levels of extra difficulty. The model exhibits intimate connections with (and we hope can also serve as a useful supplement to) certain existing learning and complexity theories.