Chi Harold Liu

CV
h-index26
40papers
2,737citations
Novelty53%
AI Score60

40 Papers

CVApr 19, 2022
SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation

Binhui Xie, Shuang Li, Mingjia Li et al. · tsinghua

Domain adaptive semantic segmentation attempts to make satisfactory dense predictions on an unlabeled target domain by utilizing the supervised model trained on a labeled source domain. In this work, we propose Semantic-Guided Pixel Contrast (SePiCo), a novel one-stage adaptation framework that highlights the semantic concepts of individual pixels to promote learning of class-discriminative and class-balanced pixel representations across domains, eventually boosting the performance of self-training methods. Specifically, to explore proper semantic concepts, we first investigate a centroid-aware pixel contrast that employs the category centroids of the entire source domain or a single source image to guide the learning of discriminative features. Considering the possible lack of category diversity in semantic concepts, we then blaze a trail of distributional perspective to involve a sufficient quantity of instances, namely distribution-aware pixel contrast, in which we approximate the true distribution of each semantic category from the statistics of labeled source data. Moreover, such an optimization objective can derive a closed-form upper bound by implicitly involving an infinite number of (dis)similar pairs, making it computationally efficient. Extensive experiments show that SePiCo not only helps stabilize training but also yields discriminative representations, making significant progress on both synthetic-to-real and daytime-to-nighttime adaptation scenarios.

CVNov 22, 2022Code
VBLC: Visibility Boosting and Logit-Constraint Learning for Domain Adaptive Semantic Segmentation under Adverse Conditions

Mingjia Li, Binhui Xie, Shuang Li et al.

Generalizing models trained on normal visual conditions to target domains under adverse conditions is demanding in the practical systems. One prevalent solution is to bridge the domain gap between clear- and adverse-condition images to make satisfactory prediction on the target. However, previous methods often reckon on additional reference images of the same scenes taken from normal conditions, which are quite tough to collect in reality. Furthermore, most of them mainly focus on individual adverse condition such as nighttime or foggy, weakening the model versatility when encountering other adverse weathers. To overcome the above limitations, we propose a novel framework, Visibility Boosting and Logit-Constraint learning (VBLC), tailored for superior normal-to-adverse adaptation. VBLC explores the potential of getting rid of reference images and resolving the mixture of adverse conditions simultaneously. In detail, we first propose the visibility boost module to dynamically improve target images via certain priors in the image level. Then, we figure out the overconfident drawback in the conventional cross-entropy loss for self-training method and devise the logit-constraint learning, which enforces a constraint on logit outputs during training to mitigate this pain point. To the best of our knowledge, this is a new perspective for tackling such a challenging task. Extensive experiments on two normal-to-adverse domain adaptation benchmarks, i.e., Cityscapes -> ACDC and Cityscapes -> FoggyCityscapes + RainCityscapes, verify the effectiveness of VBLC, where it establishes the new state of the art. Code is available at https://github.com/BIT-DA/VBLC.

84.8CVMay 28
FlowSeg: Dynamic Semantic Guidance for LLM-Conditioned Segmentation

Zekang Zhang, Guangyu Gao, Youyun Tang et al.

LLM-conditioned segmentation has recently advanced rapidly by coupling large language models with iterative mask generation frameworks. However, we identify a persistent failure mode in current propose-then-select pipelines. Although high-quality mask candidates are often generated, the final prediction may fail to match the given linguistic condition. This failure arises because language semantics are typically used as static prompts or post-hoc matching signals, rather than participating in the iterative mask generation process. Through systematic analysis, we show that many errors stem from semantic misalignment rather than poor mask quality. To address this issue, we propose FlowSeg, which introduces dynamic semantic guidance via a bidirectional semantic flow between intermediate decoding states and LLM-derived condition embeddings throughout the generation process. Language conditions actively guide mask refinement at each stage, while condition embeddings are progressively updated by emerging visual evidence. This design yields semantically grounded mask representations and visually aligned language conditions, enabling more reliable matching. We further incorporate a lightweight boundary-aware refinement to selectively enhance uncertain regions without perturbing confident interiors. Extensive experiments on referring expression segmentation and reasoning segmentation tasks demonstrate that FlowSeg consistently improves language-mask alignment and achieves state-of-the-art performance. Project page: https://zkzhang98.github.io/FlowSeg_page

CVApr 29, 2022
Improving Transferability for Domain Adaptive Detection Transformers

Kaixiong Gong, Shuang Li, Shugang Li et al.

DETR-style detectors stand out amongst in-domain scenarios, but their properties in domain shift settings are under-explored. This paper aims to build a simple but effective baseline with a DETR-style detector on domain shift settings based on two findings. For one, mitigating the domain shift on the backbone and the decoder output features excels in getting favorable results. For another, advanced domain alignment methods in both parts further enhance the performance. Thus, we propose the Object-Aware Alignment (OAA) module and the Optimal Transport based Alignment (OTA) module to achieve comprehensive domain alignment on the outputs of the backbone and the detector. The OAA module aligns the foreground regions identified by pseudo-labels in the backbone outputs, leading to domain-invariant based features. The OTA module utilizes sliced Wasserstein distance to maximize the retention of location information while minimizing the domain gap in the decoder outputs. We implement the findings and the alignment modules into our adaptation method, and it benchmarks the DETR-style detector on the domain shift settings. Experiments on various domain adaptive scenarios validate the effectiveness of our method.

CVFeb 27, 2023
Dirichlet-based Uncertainty Calibration for Active Domain Adaptation

Mixue Xie, Shuang Li, Rui Zhang et al.

Active domain adaptation (DA) aims to maximally boost the model adaptation on a new target domain by actively selecting limited target data to annotate, whereas traditional active learning methods may be less effective since they do not consider the domain shift issue. Despite active DA methods address this by further proposing targetness to measure the representativeness of target domain characteristics, their predictive uncertainty is usually based on the prediction of deterministic models, which can easily be miscalibrated on data with distribution shift. Considering this, we propose a \textit{Dirichlet-based Uncertainty Calibration} (DUC) approach for active DA, which simultaneously achieves the mitigation of miscalibration and the selection of informative target samples. Specifically, we place a Dirichlet prior on the prediction and interpret the prediction as a distribution on the probability simplex, rather than a point estimate like deterministic models. This manner enables us to consider all possible predictions, mitigating the miscalibration of unilateral prediction. Then a two-round selection strategy based on different uncertainty origins is designed to select target samples that are both representative of target domain and conducive to discriminability. Extensive experiments on cross-domain image classification and semantic segmentation validate the superiority of DUC.

77.4AIJun 1
SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

Lichao Wang, Zhaoxing Ren, Tianzhuo Yang et al.

As Large Language Model (LLM) agents increasingly leverage the Model Context Protocol (MCP) to operate in complex environments, the expansion of their action spaces offers agents unsafe capabilities and underscores the risk of power-seeking. While broad action space and greater environment influence are essential for task fulfillment, they create a fragile risk surface where minor errors or hallucinations are magnified into catastrophic failures. In response, we propose SafeMCP, a {server-side} defense plugin that constrains tool acquisition via predictive reasoning regarding future safety risks. SafeMCP utilizes an internal world model for look-ahead reasoning to implement a two-tier defense: proactive tool filtering to constrain hazardous power expansion and immediate intervention as a fail-safe. To train SafeMCP, we introduce a three-stage pipeline comprising environmental dynamic grounding, safe policy initialization, and reinforcement learning (RL) with dual verifiable rewards. Experiments on PowerSeeking Bench, ToolEmu, and AgentHarm show that SafeMCP achieves a safe equilibrium, effectively mitigating risks while preserving agent utility.

CVAug 2, 2022
Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation

Wenxuan Ma, Jinming Zhang, Shuang Li et al. · tsinghua

Extensive studies on Unsupervised Domain Adaptation (UDA) have propelled the deployment of deep learning from limited experimental datasets into real-world unconstrained domains. Most UDA approaches align features within a common embedding space and apply a shared classifier for target prediction. However, since a perfectly aligned feature space may not exist when the domain discrepancy is large, these methods suffer from two limitations. First, the coercive domain alignment deteriorates target domain discriminability due to lacking target label supervision. Second, the source-supervised classifier is inevitably biased to source data, thus it may underperform in target domain. To alleviate these issues, we propose to simultaneously conduct feature alignment in two individual spaces focusing on different domains, and create for each space a domain-oriented classifier tailored specifically for that domain. Specifically, we design a Domain-Oriented Transformer (DOT) that has two individual classification tokens to learn different domain-oriented representations, and two classifiers to preserve domain-wise discriminability. Theoretical guaranteed contrastive-based alignment and the source-guided pseudo-label refinement strategy are utilized to explore both domain-invariant and specific information. Comprehensive experiments validate that our method achieves state-of-the-art on several benchmarks.

LGMar 27, 2022
Causality Inspired Representation Learning for Domain Generalization

Fangrui Lv, Jian Liang, Shuang Li et al.

Domain generalization (DG) is essentially an out-of-distribution problem, aiming to generalize the knowledge learned from multiple source domains to an unseen target domain. The mainstream is to leverage statistical models to model the dependence between data and labels, intending to learn representations independent of domain. Nevertheless, the statistical models are superficial descriptions of reality since they are only required to model dependence instead of the intrinsic causal mechanism. When the dependence changes with the target distribution, the statistic models may fail to generalize. In this regard, we introduce a general structural causal model to formalize the DG problem. Specifically, we assume that each input is constructed from a mix of causal factors (whose relationship with the label is invariant across domains) and non-causal factors (category-independent), and only the former cause the classification judgments. Our goal is to extract the causal factors from inputs and then reconstruct the invariant causal mechanisms. However, the theoretical idea is far from practical of DG since the required causal/non-causal factors are unobserved. We highlight that ideal causal factors should meet three basic properties: separated from the non-causal ones, jointly independent, and causally sufficient for the classification. Based on that, we propose a Causality Inspired Representation Learning (CIRL) algorithm that enforces the representations to satisfy the above properties and then uses them to simulate the causal factors, which yields improved generalization ability. Extensive experimental results on several widely used datasets verify the effectiveness of our approach.

CVOct 10, 2023Code
CoinSeg: Contrast Inter- and Intra- Class Representations for Incremental Segmentation

Zekang Zhang, Guangyu Gao, Jianbo Jiao et al.

Class incremental semantic segmentation aims to strike a balance between the model's stability and plasticity by maintaining old knowledge while adapting to new concepts. However, most state-of-the-art methods use the freeze strategy for stability, which compromises the model's plasticity.In contrast, releasing parameter training for plasticity could lead to the best performance for all categories, but this requires discriminative feature representation.Therefore, we prioritize the model's plasticity and propose the Contrast inter- and intra-class representations for Incremental Segmentation (CoinSeg), which pursues discriminative representations for flexible parameter tuning. Inspired by the Gaussian mixture model that samples from a mixture of Gaussian distributions, CoinSeg emphasizes intra-class diversity with multiple contrastive representation centroids. Specifically, we use mask proposals to identify regions with strong objectness that are likely to be diverse instances/centroids of a category. These mask proposals are then used for contrastive representations to reinforce intra-class diversity. Meanwhile, to avoid bias from intra-class diversity, we also apply category-level pseudo-labels to enhance category-level consistency and inter-category diversity. Additionally, CoinSeg ensures the model's stability and alleviates forgetting through a specific flexible tuning strategy. We validate CoinSeg on Pascal VOC 2012 and ADE20K datasets with multiple incremental scenarios and achieve superior results compared to previous state-of-the-art methods, especially in more challenging and realistic long-term scenarios. Code is available at https://github.com/zkzhang98/CoinSeg.

LGDec 4, 2022
FedKNOW: Federated Continual Learning with Signature Task Knowledge Integration at Edge

Yaxin Luopan, Rui Han, Qinglong Zhang et al.

Deep Neural Networks (DNNs) have been ubiquitously adopted in internet of things and are becoming an integral of our daily life. When tackling the evolving learning tasks in real world, such as classifying different types of objects, DNNs face the challenge to continually retrain themselves according to the tasks on different edge devices. Federated continual learning is a promising technique that offers partial solutions but yet to overcome the following difficulties: the significant accuracy loss due to the limited on-device processing, the negative knowledge transfer caused by the limited communication of non-IID data, and the limited scalability on the tasks and edge devices. In this paper, we propose FedKNOW, an accurate and scalable federated continual learning framework, via a novel concept of signature task knowledge. FedKNOW is a client side solution that continuously extracts and integrates the knowledge of signature tasks which are highly influenced by the current task. Each client of FedKNOW is composed of a knowledge extractor, a gradient restorer and, most importantly, a gradient integrator. Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients' current tasks through the global model. We implement FedKNOW in PyTorch and extensively evaluate it against state-of-the-art techniques using popular federated continual learning benchmarks. Extensive evaluation results on heterogeneous edge devices show that FedKNOW improves model accuracy by 63.24% without increasing model training time, reduces communication cost by 34.28%, and achieves more improvements under difficult scenarios such as large numbers of tasks or clients, and training different complex networks.

CVApr 26, 2022
ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

Zhenjie Yu, Kai Chen, Shuang Li et al.

Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle. This makes it arduous to capture and analyze information for human beings, meanwhile hindering its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Extensive experiments and evaluations for specific applications indicate ROMA outperforms the state-of-the-art methods. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity. In particular, it consists of 9 long video clips including City, Highway and Monitor scenarios. All clips could be split into 603,142 frames in total, which are 20 times larger than the recently released daytime infrared-to-visible dataset IRVI.

CVOct 31, 2023
Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

Binhui Xie, Shuang Li, Qingju Guo et al.

Active learning, a label-efficient paradigm, empowers models to interactively query an oracle for labeling new data. In the realm of LiDAR semantic segmentation, the challenges stem from the sheer volume of point clouds, rendering annotation labor-intensive and cost-prohibitive. This paper presents Annotator, a general and efficient active learning baseline, in which a voxel-centric online selection strategy is tailored to efficiently probe and annotate the salient and exemplar voxel girds within each LiDAR scan, even under distribution shift. Concretely, we first execute an in-depth analysis of several common selection strategies such as Random, Entropy, Margin, and then develop voxel confusion degree (VCD) to exploit the local topology relations and structures of point clouds. Annotator excels in diverse settings, with a particular focus on active learning (AL), active source-free domain adaptation (ASFDA), and active domain adaptation (ADA). It consistently delivers exceptional performance across LiDAR semantic segmentation benchmarks, spanning both simulation-to-real and real-to-real scenarios. Surprisingly, Annotator exhibits remarkable efficiency, requiring significantly fewer annotations, e.g., just labeling five voxels per scan in the SynLiDAR-to-SemanticKITTI task. This results in impressive performance, achieving 87.8% fully-supervised performance under AL, 88.5% under ASFDA, and 94.4% under ADA. We envision that Annotator will offer a simple, general, and efficient solution for label-efficient 3D applications. Project page: https://binhuixie.github.io/annotator-web

74.6CVMar 19Code
Rethinking MLLM Itself as a Segmenter with a Single Segmentation Token

Anqi Zhang, Xiaokang Ji, Guangyu Gao et al.

Recent segmentation methods leveraging Multi-modal Large Language Models (MLLMs) have shown reliable object-level segmentation and enhanced spatial perception. However, almost all previous methods predominantly rely on specialist mask decoders to interpret masks from generated segmentation-related embeddings and visual features, or incorporate multiple additional tokens to assist. This paper aims to investigate whether and how we can unlock segmentation from MLLM itSELF with 1 segmentation Embedding (SELF1E) while achieving competitive results, which eliminates the need for external decoders. To this end, our approach targets the fundamental limitation of resolution reduction in pixel-shuffled image features from MLLMs. First, we retain image features at their original uncompressed resolution, and refill them with residual features extracted from MLLM-processed compressed features, thereby improving feature precision. Subsequently, we integrate pixel-unshuffle operations on image features with and without LLM processing, respectively, to unleash the details of compressed features and amplify the residual features under uncompressed resolution, which further enhances the resolution of refilled features. Moreover, we redesign the attention mask with dual perception pathways, i.e., image-to-image and image-to-segmentation, enabling rich feature interaction between pixels and the segmentation token. Comprehensive experiments across multiple segmentation tasks validate that SELF1E achieves performance competitive with specialist mask decoder-based methods, demonstrating the feasibility of decoder-free segmentation in MLLMs. Project page: https://github.com/ANDYZAQ/SELF1E.

LGJan 17, 2023
Hierarchical Memory Pool Based Edge Semi-Supervised Continual Learning Method

Xiangwei Wang, Rui Han, Chi Harold Liu

The continuous changes in the world have resulted in the performance regression of neural networks. Therefore, continual learning (CL) area gradually attracts the attention of more researchers. For edge intelligence, the CL model not only needs to overcome catastrophic for-getting, but also needs to face the huge challenge of severely limited resources: the lack of labeled resources and powerful devices. However, the existing classic CL methods usually rely on a large number of labeled samples to maintain the plasticity and stability, and the semi-supervised learning methods often need to pay a large computational and memory overhead for higher accuracy. In response to these prob-lems, a low-cost semi-supervised CL method named Edge Hierarchical Memory Learner (EdgeHML) will be proposed. EdgeHML can effec-tively utilize a large number of unlabeled samples and a small number of labeled samples. It is based on a hierarchical memory pool, lever-age multi-level storage structure to store and replay samples. EdgeHML implements the interaction between different levels through a combination of online and offline strategies. In addition, in order to further reduce the computational overhead for unlabeled samples, EdgeHML leverages a progressive learning method. It reduces the computation cycles of unlabeled samples by controlling the learning process. The experimental results show that on three semi-supervised CL tasks, EdgeHML can improve the model accuracy by up to 16.35% compared with the classic CL method, and the training iterations time can be reduced by more than 50% compared with semi-supervised methods. EdgeHML achieves a semi-supervised CL process with high performance and low overhead for edge intelligence.

58.9HCApr 12
ZoomTable: Interactive Exploration of Data Facts in Hierarchical Tables via Semantic Zooming

Qiyang Chen, Guozheng Li, Xingqi Wang et al.

Hierarchical tables are an important structure for organizing data with inherent hierarchical relationships. Existing studies have extensively explored methods for data fact exploration from tabular data. In particular, some studies have directly integrated visual data facts into the original table structure to support in-situ exploration, because embedding data facts within the table context can reduce cognitive load by minimizing attention shifts. However, embedding a large amount of extracted data facts into the limited space of hierarchical tables often leads to layout conflicts, hindering effective exploration. To address this issue, we propose an interactive exploration paradigm for hierarchical table data facts based on semantic zooming and develop an interactive visualization system, ZoomTable. The ZoomTable system employs semantic zooming as the interaction method, combined with a data-fact layout method and a data fact recommendation mechanism. This combination not only resolves layout conflicts, but also supports users in coherently exploring multidimensional data facts at different scales. A case study and a user experiment further validate the practicality and efficiency of ZoomTable in real-world data fact exploration scenarios.

LGDec 2, 2021Code
Active Learning for Domain Adaptation: An Energy-Based Approach

Binhui Xie, Longhui Yuan, Shuang Li et al.

Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit \textit{free energy biases} when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of target data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at https://github.com/BIT-DA/EADA.

CVNov 25, 2021Code
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation

Binhui Xie, Longhui Yuan, Shuang Li et al.

Self-training has greatly facilitated domain adaptive semantic segmentation, which iteratively generates pseudo labels on unlabeled target data and retrains the network. However, realistic segmentation datasets are highly imbalanced, pseudo labels are typically biased to the majority classes and basically noisy, leading to an error-prone and suboptimal model. In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance. Our algorithm, Region Impurity and Prediction Uncertainty (RIPU), introduces a new acquisition strategy characterizing the spatial adjacency of image regions along with the prediction confidence. We show that the proposed region-based selection strategy makes more efficient use of a limited budget than image-based or point-based counterparts. Further, we enforce local prediction consistency between a pixel and its nearest neighbors on a source image. Alongside, we develop a negative learning loss to make the features more discriminative. Extensive experiments demonstrate that our method only requires very few annotations to almost reach the supervised performance and substantially outperforms state-of-the-art methods. The code is available at https://github.com/BIT-DA/RIPU.

CVAug 2, 2021Code
I2V-GAN: Unpaired Infrared-to-Visible Video Translation

Shuang Li, Bingfeng Han, Zhenjie Yu et al.

Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1)adversarial constraint to generate synthetic frames that are similar to the real ones, 2)cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3)similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for I2V video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments validate that I2V-GAN is superior to the compared SOTA methods in the translation of I2V videos with higher fluency and finer semantic details. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.

CVMay 11, 2021Code
Semantic Distribution-aware Contrastive Adaptation for Semantic Segmentation

Shuang Li, Binhui Xie, Bin Zang et al.

Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain. Current state-of-the-art works suggest that performing category alignment can alleviate domain shift reasonably. However, they are mainly based on image-to-image adversarial training and little consideration is given to semantic variations of an object among images, failing to capture a comprehensive picture of different categories. This motivates us to explore a holistic representative, the semantic distribution from each category in source domain, to mitigate the problem above. In this paper, we present semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment under the guidance of semantic distributions. Specifically, we first design a pixel-wise contrastive loss by considering the correspondences between semantic distributions and pixel-wise representations from both domains. Essentially, clusters of pixel representations from the same category should cluster together and those from different categories should spread out. Next, an upper bound on this formulation is derived by involving the learning of an infinite number of (dis)similar pairs, making it efficient. Finally, we verify that SDCA can further improve segmentation accuracy when integrated with the self-supervised learning. We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.The code is publicly available at https://github.com/BIT-DA/SDCA

CVAug 4, 2020Code
Simultaneous Semantic Alignment Network for Heterogeneous Domain Adaptation

Shuang Li, Binhui Xie, Jiashu Wu et al.

Heterogeneous domain adaptation (HDA) transfers knowledge across source and target domains that present heterogeneities e.g., distinct domain distributions and difference in feature type or dimension. Most previous HDA methods tackle this problem through learning a domain-invariant feature subspace to reduce the discrepancy between domains. However, the intrinsic semantic properties contained in data are under-explored in such alignment strategy, which is also indispensable to achieve promising adaptability. In this paper, we propose a Simultaneous Semantic Alignment Network (SSAN) to simultaneously exploit correlations among categories and align the centroids for each category across domains. In particular, we propose an implicit semantic correlation loss to transfer the correlation knowledge of source categorical prediction distributions to target domain. Meanwhile, by leveraging target pseudo-labels, a robust triplet-centroid alignment mechanism is explicitly applied to align feature representations for each category. Notably, a pseudo-label refinement procedure with geometric similarity involved is introduced to enhance the target pseudo-label assignment accuracy. Comprehensive experiments on various HDA tasks across text-to-image, image-to-image and text-to-text successfully validate the superiority of our SSAN against state-of-the-art HDA methods. The code is publicly available at https://github.com/BIT-DA/SSAN.

CLMar 5, 2024
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

Bin Zhang, Yuxiao Ye, Guoqing Du et al.

Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL process, which hinders the assessment of LLMs' cognitive capabilities and the optimization of LLM-based solutions. To address the aforementioned issues, we firstly construct a new dataset designed to mitigate the risk of overfitting in LLMs. Then we formulate five evaluation tasks to comprehensively assess the performance of diverse methods across various LLMs throughout the Text-to-SQL process.Our study highlights the performance disparities among LLMs and proposes optimal in-context learning solutions tailored to each task. These findings offer valuable insights for enhancing the development of LLM-based Text-to-SQL systems.

87.1ROApr 29
STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

Yuxuan Tian, Yurun Jin, Bin Yu et al.

Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-temporal prediction with action generation. STARRY jointly denoises future spatial-temporal latents and action sequences, and introduces Geometry-Aware Selective Attention Modulation to convert predicted depth and end-effector geometry into token-aligned weights for selective action-attention modulation. On RoboTwin 2.0, STARRY achieves 93.82% / 93.30% average success under Clean and Randomized settings. Real-world experiments further improve average success from 42.5% to 70.8% over $π_{0.5}$, demonstrating the effectiveness of action-centric spatial-temporal world modeling for spatial-temporally demanding robotic action generation.

LGDec 22, 2024
ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning

Kun Wu, Yinuo Zhao, Zhiyuan Xu et al.

Offline Reinforcement Learning (RL), which operates solely on static datasets without further interactions with the environment, provides an appealing alternative to learning a safe and promising control policy. The prevailing methods typically learn a conservative policy to mitigate the problem of Q-value overestimation, but it is prone to overdo it, leading to an overly conservative policy. Moreover, they optimize all samples equally with fixed constraints, lacking the nuanced ability to control conservative levels in a fine-grained manner. Consequently, this limitation results in a performance decline. To address the above two challenges in a united way, we propose a framework, Adaptive Conservative Level in Q-Learning (ACL-QL), which limits the Q-values in a mild range and enables adaptive control on the conservative level over each state-action pair, i.e., lifting the Q-values more for good transitions and less for bad transitions. We theoretically analyze the conditions under which the conservative level of the learned Q-function can be limited in a mild range and how to optimize each transition adaptively. Motivated by the theoretical analysis, we propose a novel algorithm, ACL-QL, which uses two learnable adaptive weight functions to control the conservative level over each transition. Subsequently, we design a monotonicity loss and surrogate losses to train the adaptive weight functions, Q-function, and policy network alternatively. We evaluate ACL-QL on the commonly used D4RL benchmark and conduct extensive ablation studies to illustrate the effectiveness and state-of-the-art performance compared to existing offline DRL baselines.

CVApr 5, 2025
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation

Kai Fang, Anqi Zhang, Guangyu Gao et al.

Effective Class Incremental Segmentation (CIS) requires simultaneously mitigating catastrophic forgetting and ensuring sufficient plasticity to integrate new classes. The inherent conflict above often leads to a back-and-forth, which turns the objective into finding the balance between the performance of previous~(old) and incremental~(new) classes. To address this conflict, we introduce a novel approach, Conflict Mitigation via Branched Optimization~(CoMBO). Within this approach, we present the Query Conflict Reduction module, designed to explicitly refine queries for new classes through lightweight, class-specific adapters. This module provides an additional branch for the acquisition of new classes while preserving the original queries for distillation. Moreover, we develop two strategies to further mitigate the conflict following the branched structure, \textit{i.e.}, the Half-Learning Half-Distillation~(HDHL) over classification probabilities, and the Importance-Based Knowledge Distillation~(IKD) over query features. HDHL selectively engages in learning for classification probabilities of queries that match the ground truth of new classes, while aligning unmatched ones to the corresponding old probabilities, thus ensuring retention of old knowledge while absorbing new classes via learning negative samples. Meanwhile, IKD assesses the importance of queries based on their matching degree to old classes, prioritizing the distillation of important features and allowing less critical features to evolve. Extensive experiments in Class Incremental Panoptic and Semantic Segmentation settings have demonstrated the superior performance of CoMBO. Project page: https://guangyu-ryan.github.io/CoMBO.

LGDec 29, 2023
HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning

Hao Wang, Bo Tang, Chi Harold Liu et al.

Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.

ROJan 17, 2024
Efficient Training of Generalizable Visuomotor Policies via Control-Aware Augmentation

Yinuo Zhao, Kun Wu, Tianjiao Yi et al.

Improving generalization is one key challenge in embodied AI, where obtaining large-scale datasets across diverse scenarios is costly. Traditional weak augmentations, such as cropping and flipping, are insufficient for improving a model's performance in new environments. Existing data augmentation methods often disrupt task-relevant information in images, potentially degrading performance. To overcome these challenges, we introduce EAGLE, an efficient training framework for generalizable visuomotor policies that improves upon existing methods by (1) enhancing generalization by applying augmentation only to control-related regions identified through a self-supervised control-aware mask and (2) improving training stability and efficiency by distilling knowledge from an expert to a visuomotor student policy, which is then deployed to unseen environments without further fine-tuning. Comprehensive experiments on three domains, including the DMControl Generalization Benchmark, the enhanced Robot Manipulation Distraction Benchmark, and a long-sequential drawer-opening task, validate the effectiveness of our method.

ROMar 31, 2025
HACTS: a Human-As-Copilot Teleoperation System for Robot Learning

Zhiyuan Xu, Yinuo Zhao, Kun Wu et al.

Teleoperation is essential for autonomous robot learning, especially in manipulation tasks that require human demonstrations or corrections. However, most existing systems only offer unilateral robot control and lack the ability to synchronize the robot's status with the teleoperation hardware, preventing real-time, flexible intervention. In this work, we introduce HACTS (Human-As-Copilot Teleoperation System), a novel system that establishes bilateral, real-time joint synchronization between a robot arm and teleoperation hardware. This simple yet effective feedback mechanism, akin to a steering wheel in autonomous vehicles, enables the human copilot to intervene seamlessly while collecting action-correction data for future learning. Implemented using 3D-printed components and low-cost, off-the-shelf motors, HACTS is both accessible and scalable. Our experiments show that HACTS significantly enhances performance in imitation learning (IL) and reinforcement learning (RL) tasks, boosting IL recovery capabilities and data efficiency, and facilitating human-in-the-loop RL. HACTS paves the way for more effective and interactive human-robot collaboration and data-collection, advancing the capabilities of robot manipulation.

CVFeb 17, 2022
CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-based Autonomous Urban Driving

Yinuo Zhao, Kun Wu, Zhiyuan Xu et al.

Vision-based autonomous urban driving in dense traffic is quite challenging due to the complicated urban environment and the dynamics of the driving behaviors. Widely-applied methods either heavily rely on hand-crafted rules or learn from limited human experience, which makes them hard to generalize to rare but critical scenarios. In this paper, we present a novel CAscade Deep REinforcement learning framework, CADRE, to achieve model-free vision-based autonomous urban driving. In CADRE, to derive representative latent features from raw observations, we first offline train a Co-attention Perception Module (CoPM) that leverages the co-attention mechanism to learn the inter-relationships between the visual and control information from a pre-collected driving dataset. Cascaded by the frozen CoPM, we then present an efficient distributed proximal policy optimization framework to online learn the driving policy under the guidance of particularly designed reward functions. We perform a comprehensive empirical study with the CARLA NoCrash benchmark as well as specific obstacle avoidance scenarios in autonomous urban driving tasks. The experimental results well justify the effectiveness of CADRE and its superiority over the state-of-the-art by a wide margin.

CVDec 18, 2021
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Rui Han, Qinglong Zhang, Chi Harold Liu et al.

Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-the-art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.

LGDec 8, 2021
Pareto Domain Adaptation

Fangrui Lv, Jian Liang, Kaixiong Gong et al.

Domain adaptation (DA) attempts to transfer the knowledge from a labeled source domain to an unlabeled target domain that follows different distribution from the source. To achieve this, DA methods include a source classification objective to extract the source knowledge and a domain alignment objective to diminish the domain shift, ensuring knowledge transfer. Typically, former DA methods adopt some weight hyper-parameters to linearly combine the training objectives to form an overall objective. However, the gradient directions of these objectives may conflict with each other due to domain shift. Under such circumstances, the linear optimization scheme might decrease the overall objective value at the expense of damaging one of the training objectives, leading to restricted solutions. In this paper, we rethink the optimization scheme for DA from a gradient-based perspective. We propose a Pareto Domain Adaptation (ParetoDA) approach to control the overall optimization direction, aiming to cooperatively optimize all training objectives. Specifically, to reach a desirable solution on the target domain, we design a surrogate loss mimicking target classification. To improve target-prediction accuracy to support the mimicking, we propose a target-prediction refining mechanism which exploits domain labels via Bayes' theorem. On the other hand, since prior knowledge of weighting schemes for objectives is often unavailable to guide optimization to approach the optimal solution on the target domain, we propose a dynamic preference mechanism to dynamically guide our cooperative optimization by the gradient of the surrogate loss on a held-out unlabeled target dataset. Extensive experiments on image classification and semantic segmentation benchmarks demonstrate the effectiveness of ParetoDA

CVNov 25, 2021
Exploiting Both Domain-specific and Invariant Knowledge via a Win-win Transformer for Unsupervised Domain Adaptation

Wenxuan Ma, Jinming Zhang, Shuang Li et al.

Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Most existing UDA approaches enable knowledge transfer via learning domain-invariant representation and sharing one classifier across two domains. However, ignoring the domain-specific information that are related to the task, and forcing a unified classifier to fit both domains will limit the feature expressiveness in each domain. In this paper, by observing that the Transformer architecture with comparable parameters can generate more transferable representations than CNN counterparts, we propose a Win-Win TRansformer framework (WinTR) that separately explores the domain-specific knowledge for each domain and meanwhile interchanges cross-domain knowledge. Specifically, we learn two different mappings using two individual classification tokens in the Transformer, and design for each one a domain-specific classifier. The cross-domain knowledge is transferred via source guided label refinement and single-sided feature alignment with respect to source or target, which keeps the integrity of domain-specific information. Extensive experiments on three benchmark datasets show that our method outperforms the state-of-the-art UDA methods, validating the effectiveness of exploiting both domain-specific and invariant

CVAug 12, 2021
Semantic Concentration for Domain Adaptation

Shuang Li, Mixue Xie, Fangrui Lv et al.

Domain adaptation (DA) paves the way for label annotation and dataset bias issues by the knowledge transfer from a label-rich source domain to a related but unlabeled target domain. A mainstream of DA methods is to align the feature distributions of the two domains. However, the majority of them focus on the entire image features where irrelevant semantic information, e.g., the messy background, is inevitably embedded. Enforcing feature alignments in such case will negatively influence the correct matching of objects and consequently lead to the semantically negative transfer due to the confusion of irrelevant semantics. To tackle this issue, we propose Semantic Concentration for Domain Adaptation (SCDA), which encourages the model to concentrate on the most principal features via the pair-wise adversarial alignment of prediction distributions. Specifically, we train the classifier to class-wisely maximize the prediction distribution divergence of each sample pair, which enables the model to find the region with large differences among the same class of samples. Meanwhile, the feature extractor attempts to minimize that discrepancy, which suppresses the features of dissimilar regions among the same class of samples and accentuates the features of principal parts. As a general method, SCDA can be easily integrated into various DA methods as a regularizer to further boost their performance. Extensive experiments on the cross-domain benchmarks show the efficacy of SCDA.

CVMar 26, 2021
Dynamic Domain Adaptation for Efficient Inference

Shuang Li, Jinming Zhang, Wenxuan Ma et al.

Domain adaptation (DA) enables knowledge transfer from a labeled source domain to an unlabeled target domain by reducing the cross-domain distribution discrepancy. Most prior DA approaches leverage complicated and powerful deep neural networks to improve the adaptation capacity and have shown remarkable success. However, they may have a lack of applicability to real-world situations such as real-time interaction, where low target inference latency is an essential requirement under limited computational budget. In this paper, we tackle the problem by proposing a dynamic domain adaptation (DDA) framework, which can simultaneously achieve efficient target inference in low-resource scenarios and inherit the favorable cross-domain generalization brought by DA. In contrast to static models, as a simple yet generic method, DDA can integrate various domain confusion constraints into any typical adaptive network, where multiple intermediate classifiers can be equipped to infer "easier" and "harder" target data dynamically. Moreover, we present two novel strategies to further boost the adaptation performance of multiple prediction exits: 1) a confidence score learning strategy to derive accurate target pseudo labels by fully exploring the prediction consistency of different classifiers; 2) a class-balanced self-training strategy to explicitly adapt multi-stage classifiers from source to target without losing prediction diversity. Extensive experiments on multiple benchmarks are conducted to verify that DDA can consistently improve the adaptation performance and accelerate target inference under domain shift and limited resources scenarios

CVMar 23, 2021
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Shuang Li, Kaixiong Gong, Chi Harold Liu et al.

Real-world training data usually exhibits long-tailed distribution, where several majority classes have a significantly larger number of samples than the remaining minority classes. This imbalance degrades the performance of typical supervised learning algorithms designed for balanced training sets. In this paper, we address this issue by augmenting minority classes with a recently proposed implicit semantic data augmentation (ISDA) algorithm, which produces diversified augmented samples by translating deep features along many semantically meaningful directions. Importantly, given that ISDA estimates the class-conditional statistics to obtain semantic directions, we find it ineffective to do this on minority classes due to the insufficient training data. To this end, we propose a novel approach to learn transformed semantic directions with meta-learning automatically. In specific, the augmentation strategy during training is dynamically optimized, aiming to minimize the loss on a small balanced validation set, which is approximated via a meta update step. Extensive empirical results on CIFAR-LT-10/100, ImageNet-LT, and iNaturalist 2017/2018 validate the effectiveness of our method.

CVMar 23, 2021
Transferable Semantic Augmentation for Domain Adaptation

Shuang Li, Mixue Xie, Kaixiong Gong et al.

Domain adaptation has been widely explored by transferring the knowledge from a label-rich source domain to a related but unlabeled target domain. Most existing domain adaptation algorithms attend to adapting feature representations across two domains with the guidance of a shared source-supervised classifier. However, such classifier limits the generalization ability towards unlabeled target recognition. To remedy this, we propose a Transferable Semantic Augmentation (TSA) approach to enhance the classifier adaptation ability through implicitly generating source features towards target semantics. Specifically, TSA is inspired by the fact that deep feature transformation towards a certain direction can be represented as meaningful semantic altering in the original input space. Thus, source features can be augmented to effectively equip with target semantics to train a more transferable classifier. To achieve this, for each class, we first use the inter-domain feature mean difference and target intra-class feature covariance to construct a multivariate normal distribution. Then we augment source features with random directions sampled from the distribution class-wisely. Interestingly, such source augmentation is implicitly implemented through an expected transferable cross-entropy loss over the augmented source distribution, where an upper bound of the expected loss is derived and minimized, introducing negligible computational overhead. As a light-weight and general technique, TSA can be easily plugged into various domain adaptation methods, bringing remarkable improvements. Comprehensive experiments on cross-domain benchmarks validate the efficacy of TSA.

CVMar 23, 2021
Generalized Domain Conditioned Adaptation Network

Shuang Li, Binhui Xie, Qiuxia Lin et al.

Domain Adaptation (DA) attempts to transfer knowledge learned in the labeled source domain to the unlabeled but related target domain without requiring large amounts of target supervision. Recent advances in DA mainly proceed by aligning the source and target distributions. Despite the significant success, the adaptation performance still degrades accordingly when the source and target domains encounter a large distribution discrepancy. We consider this limitation may attribute to the insufficient exploration of domain-specialized features because most studies merely concentrate on domain-general feature learning in task-specific layers and integrate totally-shared convolutional networks (convnets) to generate common features for both domains. In this paper, we relax the completely-shared convnets assumption adopted by previous DA methods and propose Domain Conditioned Adaptation Network (DCAN), which introduces domain conditioned channel attention module with a multi-path structure to separately excite channel activation for each domain. Such a partially-shared convnets module allows domain-specialized features in low-level to be explored appropriately. Further, given the knowledge transferability varying along with convolutional layers, we develop Generalized Domain Conditioned Adaptation Network (GDCAN) to automatically determine whether domain channel activations should be separately modeled in each attention module. Afterward, the critical domain-specialized knowledge could be adaptively extracted according to the domain statistic gaps. As far as we know, this is the first work to explore the domain-wise convolutional channel activations separately for deep DA networks. Additionally, to effectively match high-level feature distributions across domains, we consider deploying feature adaptation blocks after task-specific layers, which can explicitly mitigate the domain discrepancy.

CVDec 13, 2020
Bi-Classifier Determinacy Maximization for Unsupervised Domain Adaptation

Shuang Li, Fangrui Lv, Binhui Xie et al.

Unsupervised domain adaptation challenges the problem of transferring knowledge from a well-labelled source domain to an unlabelled target domain. Recently,adversarial learning with bi-classifier has been proven effective in pushing cross-domain distributions close. Prior approaches typically leverage the disagreement between bi-classifier to learn transferable representations, however, they often neglect the classifier determinacy in the target domain, which could result in a lack of feature discriminability. In this paper, we present a simple yet effective method, namely Bi-Classifier Determinacy Maximization(BCDM), to tackle this problem. Motivated by the observation that target samples cannot always be separated distinctly by the decision boundary, here in the proposed BCDM, we design a novel classifier determinacy disparity (CDD) metric, which formulates classifier discrepancy as the class relevance of distinct target predictions and implicitly introduces constraint on the target feature discriminability. To this end, the BCDM can generate discriminative representations by encouraging target predictive outputs to be consistent and determined, meanwhile, preserve the diversity of predictions in an adversarial manner. Furthermore, the properties of CDD as well as the theoretical guarantees of BCDM's generalization bound are both elaborated. Extensive experiments show that BCDM compares favorably against the existing state-of-the-art domain adaptation methods.

CVMay 14, 2020
Domain Conditioned Adaptation Network

Shuang Li, Chi Harold Liu, Qiuxia Lin et al.

Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-specific layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-specific feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the first work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-specific layers, which will explicitly correct the domain discrepancy. Extensive experiments on three cross-domain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.

CVApr 10, 2020
Deep Residual Correction Network for Partial Domain Adaptation

Shuang Li, Chi Harold Liu, Qiuxia Lin et al.

Deep domain adaptation methods have achieved appealing performance by learning transferable representations from a well-labeled source domain to a different but related unlabeled target domain. Most existing works assume source and target data share the identical label space, which is often difficult to be satisfied in many real-world applications. With the emergence of big data, there is a more practical scenario called partial domain adaptation, where we are always accessible to a more large-scale source domain while working on a relative small-scale target domain. In this case, the conventional domain adaptation assumption should be relaxed, and the target label space tends to be a subset of the source label space. Intuitively, reinforcing the positive effects of the most relevant source subclasses and reducing the negative impacts of irrelevant source subclasses are of vital importance to address partial domain adaptation challenge. This paper proposes an efficiently-implemented Deep Residual Correction Network (DRCN) by plugging one residual block into the source network along with the task-specific feature layer, which effectively enhances the adaptation from source to target and explicitly weakens the influence from the irrelevant source classes. Specifically, the plugged residual block, which consists of several fully-connected layers, could deepen basic network and boost its feature representation capability correspondingly. Moreover, we design a weighted class-wise domain alignment loss to couple two domains by matching the feature distributions of shared classes between source and target. Comprehensive experiments on partial, traditional and fine-grained cross-domain visual recognition demonstrate that DRCN is superior to the competitive deep domain adaptation approaches.

NIJan 17, 2018
Experience-driven Networking: A Deep Reinforcement Learning based Approach

Zhiyuan Xu, Jian Tang, Jingsong Meng et al.

Modern communication networks have become very complicated and highly dynamic, which makes them hard to model, predict and control. In this paper, we develop a novel experience-driven approach that can learn to well control a communication network from its own experience rather than an accurate mathematical model, just as a human learns a new skill (such as driving, swimming, etc). Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in communication networks; and present a novel and highly effective DRL-based control framework, DRL-TE, for a fundamental networking problem: Traffic Engineering (TE). The proposed framework maximizes a widely-used utility function by jointly learning network environment and its dynamics, and making decisions under the guidance of powerful Deep Neural Networks (DNNs). We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. To validate and evaluate the proposed framework, we implemented it in ns-3, and tested it comprehensively with both representative and randomly generated network topologies. Extensive packet-level simulation results show that 1) compared to several widely-used baseline methods, DRL-TE significantly reduces end-to-end delay and consistently improves the network utility, while offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL method (for continuous control), Deep Deterministic Policy Gradient (DDPG), which, however, does not offer satisfying performance.