CVNov 22, 2023Code
SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent VehiclesWeihao Yan, Yeqiang Qian, Xingyuan Chen et al.
Semantic segmentation plays a critical role in enabling intelligent vehicles to comprehend their surrounding environments. However, deep learning-based methods usually perform poorly in domain shift scenarios due to the lack of labeled data for training. Unsupervised domain adaptation (UDA) techniques have emerged to bridge the gap across different driving scenes and enhance model performance on unlabeled target environments. Although self-training UDA methods have achieved state-of-the-art results, the challenge of generating precise pseudo-labels persists. These pseudo-labels tend to favor majority classes, consequently sacrificing the performance of rare classes or small objects like traffic lights and signs. To address this challenge, we introduce SAM4UDASS, a novel approach that incorporates the Segment Anything Model (SAM) into self-training UDA methods for refining pseudo-labels. It involves Semantic-Guided Mask Labeling, which assigns semantic labels to unlabeled SAM masks using UDA pseudo-labels. Furthermore, we devise fusion strategies aimed at mitigating semantic granularity inconsistency between SAM masks and the target domain. SAM4UDASS innovatively integrate SAM with UDA for semantic segmentation in driving scenes and seamlessly complements existing self-training UDA methodologies. Extensive experiments on synthetic-to-real and normal-to-adverse driving datasets demonstrate its effectiveness. It brings more than 3% mIoU gains on GTA5-to-Cityscapes, SYNTHIA-to-Cityscapes, and Cityscapes-to-ACDC when using DAFormer and achieves SOTA when using MIC. The code will be available at https://github.com/ywher/SAM4UDASS.
LGNov 18, 2023Code
Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-makingYueyuan Li, Songan Zhang, Mingyang Jiang et al.
Simulation is a prospective method for generating diverse and realistic traffic scenarios to aid in the development of driving decision-making systems. However, existing simulators often fall short in diverse scenarios or interactive behavior models for traffic participants. This deficiency underscores the need for a flexible, reliable, user-friendly open-source simulator. Addressing this challenge, Tactics2D adopts a modular approach to traffic scenario construction, encompassing road elements, traffic regulations, behavior models, physics simulations for vehicles, and event detection mechanisms. By integrating numerous commonly utilized algorithms and configurations, Tactics2D empowers users to construct their driving scenarios effortlessly, just like assembling building blocks. Users can effectively evaluate the performance of driving decision-making models across various scenarios by leveraging both public datasets and user-collected real-world data. For access to the source code and community support, please visit the official GitHub page for Tactics2D at https://github.com/WoodOxen/Tactics2D.
LGAug 1, 2023
Dynamic ensemble selection based on Deep Neural Network Uncertainty Estimation for Adversarial RobustnessRuoxi Qin, Linyuan Wang, Xuehui Du et al.
The deep neural network has attained significant efficiency in image recognition. However, it has vulnerable recognition robustness under extensive data uncertainty in practical applications. The uncertainty is attributed to the inevitable ambient noise and, more importantly, the possible adversarial attack. Dynamic methods can effectively improve the defense initiative in the arms race of attack and defense of adversarial examples. Different from the previous dynamic method depend on input or decision, this work explore the dynamic attributes in model level through dynamic ensemble selection technology to further protect the model from white-box attacks and improve the robustness. Specifically, in training phase the Dirichlet distribution is apply as prior of sub-models' predictive distribution, and the diversity constraint in parameter space is introduced under the lightweight sub-models to construct alternative ensembel model spaces. In test phase, the certain sub-models are dynamically selected based on their rank of uncertainty value for the final prediction to ensure the majority accurate principle in ensemble robustness and accuracy. Compared with the previous dynamic method and staic adversarial traning model, the presented approach can achieve significant robustness results without damaging accuracy by combining dynamics and diversity property.
LGAug 16, 2024
A Mean Field Ansatz for Zero-Shot Weight TransferXingyuan Chen, Wenwei Kuang, Lei Deng et al.
The pre-training cost of large language models (LLMs) is prohibitive. One cutting-edge approach to reduce the cost is zero-shot weight transfer, also known as model growth for some cases, which magically transfers the weights trained in a small model to a large model. However, there are still some theoretical mysteries behind the weight transfer. In this paper, inspired by prior applications of mean field theory to neural network dynamics, we introduce a mean field ansatz to provide a theoretical explanation for weight transfer. Specifically, we propose the row-column (RC) ansatz under the mean field point of view, which describes the measure structure of the weights in the neural network (NN) and admits a close measure dynamic. Thus, the weights of different sizes NN admit a common distribution under proper assumptions, and weight transfer methods can be viewed as sampling methods. We empirically validate the RC ansatz by exploring simple MLP examples and LLMs such as GPT-3 and Llama-3.1. We show the mean-field point of view is adequate under suitable assumptions which can provide theoretical support for zero-shot weight transfer.
LGMay 30, 2019Code
Adversarial Sub-sequence for Text GenerationXingyuan Chen, Yanzhe Li, Peng Jin et al.
Generative adversarial nets (GAN) has been successfully introduced for generating text to alleviate the exposure bias. However, discriminators in these models only evaluate the entire sequence, which causes feedback sparsity and mode collapse. To tackle these problems, we propose a novel mechanism. It first segments the entire sequence into several sub-sequences. Then these sub-sequences, together with the entire sequence, are evaluated individually by the discriminator. At last these feedback signals are all used to guide the learning of GAN. This mechanism learns the generation of both the entire sequence and the sub-sequences simultaneously. Learning to generate sub-sequences is easy and is helpful in generating an entire sequence. It is easy to improve the existing GAN-based models with this mechanism. We rebuild three previous well-designed models with our mechanism, and the experimental results on benchmark data show these models are improved significantly, the best one outperforms the state-of-the-art model.\footnote[1]{All code and data are available at https://github.com/liyzcj/seggan.git
54.6ROMay 8
PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNNYixiong Jing, Xingyuan Chen, Guangming Wang et al.
Physics-based digital twins aim to predict the dynamics of real-world objects under interaction, enabling real-to-sim-to-real applications in robotics. Current approaches reconstruct such twins as explicit physical models (such as spring-mass systems) to predict the dynamics, but the resulting models often inherit the resolution of the visual reconstruction rather than being reduced to the physical complexity required to reproduce task-relevant dynamics. This mismatch introduces redundant topology, making repeated forward-dynamics rollouts unnecessarily expensive. To address this challenge, we present PhySPRING, an fully differentiable GNN-based method to reduce complexity in spring--mass digital twins. PhySPRING jointly learns a hierarchy of coarsened graph topologies and their mechanical parameters from observations. At each reduction level, PhySPRING merges nodes with similar learned dynamic responses to optimize the topology, while maintaining every reduced layer as an explicit spring--mass system. On the PhysTwin benchmark, PhySPRING improves dense reconstruction and prediction accuracy over PhysTwin, while reduced models retain stable physical and visual fidelity with up to a 2.30 times speed-up. We further demonstrate the effectiveness of PhySPRING in a Real2Sim robot policy-evaluation pipeline, where the reduced models are substituted zero-shot into ACT and $π_0$ evaluations, maintaining comparable manipulation success rates across downsampling levels while improving action-sampling effectiveness. Together, PhySPRING enables efficient and structure-preserving spring--mass reduction without sacrificing fidelity or robotic utility.
31.5LGMay 5
Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation InjectionYihang Peng, Peng Jin, Jie Gong et al.
Parameter-efficient fine-tuning (PEFT) has become a practical route for adapting large language models to downstream tasks, with LoRA-style methods being particularly attractive because they are inexpensive to train and easy to deploy. Most LoRA variants, however, revise the update rule within the weight space of each layer and leave the intermediate representations formed by deeper layers largely unused. We propose Echo-LoRA, a cross-layer representation injection method for parameter-efficient fine-tuning. During training, Echo-LoRA collects boundary hidden states from deeper source layers, aggregates them into a sample-level echo representation, and uses lightweight projection and gating networks to inject the resulting signal into shallow LoRA or DoRA modules. Answer-only masking, masked distillation, and stochastic routing are used to keep this auxiliary path stable and to reduce the gap between training and inference. On eight commonsense reasoning benchmarks, Echo-LoRA exceeds the reported LoRA baselines by 5.7 percentage points on average across LLaMA-7B, LLaMA2-7B, and LLaMA3-8B. Under reproduced LoRA baselines in our unified implementation, the average gain is 3.0 points; when combined with DoRA, the gain is 2.7 points. The Echo path is discarded after training, so the deployed model keeps the original low-rank LoRA/DoRA form and adds neither inference-time parameters nor inference computation.
66.5LGMay 3
Robust Conditional Conformal Prediction via Branched Normalizing FlowRui Xu, Xingyuan Chen, Wenxing Huang et al.
Conformal prediction (CP) constructs prediction sets with marginal coverage guarantees under the assumption that the calibration and test distributions are identical. However, under distribution shift, existing approaches primarily align marginal conformal score distributions, which is sufficient to preserve marginal coverage but does not control the conditional coverage error at individual test inputs. As a consequence, CP can remain unreliable in regions where the conditional score distributions are mismatched. In this work, we bound the conditional invalidity of CP under distribution shift in terms of the Wasserstein distance between the calibration and test distributions. This result highlights the role of invertible transport in mitigating conditional coverage degradation. Motivated by this insight, we introduce Branched Normalizing Flow (BNF), a two-branch architecture that normalizes a test input to the calibration distribution and transforms the prediction set of the normalized input back to the test distribution while preserving conditional guarantees. Empirically, BNF consistently improves conditional coverage robustness on nine datasets across a wide range of confidence levels.
LGOct 15, 2025
Federated Conditional Conformal Prediction via Generative ModelsRui Xu, Xingyuan Chen, Wenxing Huang et al.
Conformal Prediction (CP) provides distribution-free uncertainty quantification by constructing prediction sets that guarantee coverage of the true labels. This reliability makes CP valuable for high-stakes federated learning scenarios such as multi-center healthcare. However, standard CP assumes i.i.d. data, which is violated in federated settings where client distributions differ substantially. Existing federated CP methods address this by maintaining marginal coverage on each client, but such guarantees often fail to reflect input-conditional uncertainty. In this work, we propose Federated Conditional Conformal Prediction (Fed-CCP) via generative models, which aims for conditional coverage that adapts to local data heterogeneity. Fed-CCP leverages generative models, such as normalizing flows or diffusion models, to approximate conditional data distributions without requiring the sharing of raw data. This enables each client to locally calibrate conformal scores that reflect its unique uncertainty, while preserving global consistency through federated aggregation. Experiments on real datasets demonstrate that Fed-CCP achieves more adaptive prediction sets.
LGJan 3, 2022
Application of Machine Learning Methods in Inferring Surface Water Groundwater Exchanges using High Temporal Resolution Temperature MeasurementsMohammad A. Moghaddam, Ty P. A. Ferre, Xingyuan Chen et al.
We examine the ability of machine learning (ML) and deep learning (DL) algorithms to infer surface/ground exchange flux based on subsurface temperature observations. The observations and fluxes are produced from a high-resolution numerical model representing conditions in the Columbia River near the Department of Energy Hanford site located in southeastern Washington State. Random measurement error, of varying magnitude, is added to the synthetic temperature observations. The results indicate that both ML and DL methods can be used to infer the surface/ground exchange flux. DL methods, especially convolutional neural networks, outperform the ML methods when used to interpret noisy temperature data with a smoothing filter applied. However, the ML methods also performed well and they are can better identify a reduced number of important observations, which could be useful for measurement network optimization. Surprisingly, the ML and DL methods better inferred upward flux than downward flux. This is in direct contrast to previous findings using numerical models to infer flux from temperature observations and it may suggest that combined use of ML or DL inference with numerical inference could improve flux estimation beneath river systems.
CRMay 6, 2021
Dynamic Defense Approach for Adversarial Robustness in Deep Neural Networks via Stochastic Ensemble Smoothed ModelRuoxi Qin, Linyuan Wang, Xingyuan Chen et al.
Deep neural networks have been shown to suffer from critical vulnerabilities under adversarial attacks. This phenomenon stimulated the creation of different attack and defense strategies similar to those adopted in cyberspace security. The dependence of such strategies on attack and defense mechanisms makes the associated algorithms on both sides appear as closely reciprocating processes. The defense strategies are particularly passive in these processes, and enhancing initiative of such strategies can be an effective way to get out of this arms race. Inspired by the dynamic defense approach in cyberspace, this paper builds upon stochastic ensemble smoothing based on defense method of random smoothing and model ensemble. Proposed method employs network architecture and smoothing parameters as ensemble attributes, and dynamically change attribute-based ensemble model before every inference prediction request. The proposed method handles the extreme transferability and vulnerability of ensemble models under white-box attacks. Experimental comparison of ASR-vs-distortion curves with different attack scenarios shows that even the attacker with the highest attack capability cannot easily exceed the attack success rate associated with the ensemble smoothed model, especially under untargeted attacks.
CLMay 4, 2020
Distributional Discrepancy: A Metric for Unconditional Text GenerationPing Cai, Xingyuan Chen, Peng Jin et al.
The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data. However, when different metrics are used for comparing the methods of unconditional text generation, contradictory conclusions are drawn. The difficulty is that both the diversity and quality of the sample should be considered simultaneously when the models are evaluated. To solve this problem, a novel metric of distributional discrepancy (DD) is designed to evaluate generators based on the discrepancy between the generated and real training sentences. However, it cannot compute the DD directly because the distribution of real sentences is unavailable. Thus, we propose a method for estimating the DD by training a neural-network-based text classifier. For comparison, three existing metrics, bi-lingual evaluation understudy (BLEU) versus self-BLEU, language model score versus reverse language model score, and Fréchet embedding distance, along with the proposed DD, are used to evaluate two popular generative models of long short-term memory and generative pretrained transformer 2 on both syntactic and real data. Experimental results show that DD is significantly better than the three existing metrics for ranking these generative models.
CVApr 5, 2020
Adding A Filter Based on The Discriminator to Improve Unconditional Text GenerationXingyuan Chen, Ping Cai, Peng Jin et al.
The autoregressive language model (ALM) trained with maximum likelihood estimation (MLE) is widely used in unconditional text generation. Due to exposure bias, the generated texts still suffer from low quality and diversity. This presents statistically as a discrepancy between the real text and generated text. Some research shows a discriminator can detect this discrepancy. Because the discriminator can encode more information than the generator, discriminator has the potentiality to improve generator. To alleviate the exposure bias, generative adversarial networks (GAN) use the discriminator to update the generator's parameters directly, but they fail by being evaluated precisely. A critical reason for the failure is the difference between the discriminator input and the ALM input. We propose a novel mechanism by adding a filter which has the same input as the discriminator. First, discriminator detects the discrepancy signals and passes to filter directly (or by learning). Then, we use the filter to reject some generated samples with a sampling-based method. Thus, the original generative distribution is revised to reduce the discrepancy. Two ALMs, RNN-based and Transformer-based, are experimented. Evaluated precisely by three metrics, our mechanism consistently outperforms the ALMs and all kinds of GANs across two benchmark data sets.
CVSep 28, 2019
The Detection of Distributional Discrepancy for Text GenerationXingyuan Chen, Ping Cai, Peng Jin et al.
The text generated by neural language models is not as good as the real text. This means that their distributions are different. Generative Adversarial Nets (GAN) are used to alleviate it. However, some researchers argue that GAN variants do not work at all. When both sample quality (such as Bleu) and sample diversity (such as self-Bleu) are taken into account, the GAN variants even are worse than a well-adjusted language model. But, Bleu and self-Bleu can not precisely measure this distributional discrepancy. In fact, how to measure the distributional discrepancy between real text and generated text is still an open problem. In this paper, we theoretically propose two metric functions to measure the distributional difference between real text and generated text. Besides that, a method is put forward to estimate them. First, we evaluate language model with these two functions and find the difference is huge. Then, we try several methods to use the detected discrepancy signal to improve the generator. However the difference becomes even bigger than before. Experimenting on two existing language GANs, the distributional discrepancy between real text and generated text increases with more adversarial learning rounds. It demonstrates both of these language GANs fail.
QUANT-PHJun 22, 2018
Quantum computing cryptography: Finding cryptographic Boolean functions with quantum annealing by a 2000 qubit D-wave quantum computerFeng Hu, Lucas Lamata, Mikel Sanz et al.
As the building block in symmetric cryptography, designing Boolean functions satisfying multiple properties is an important problem in sequence ciphers, block ciphers, and hash functions. However, the search of $n$-variable Boolean functions fulfilling global cryptographic constraints is computationally hard due to the super-exponential size $\mathcal{O}(2^{2^n})$ of the space. Here, we introduce a codification of the cryptographically relevant constraints in the ground state of an Ising Hamiltonian, allowing us to naturally encode it in a quantum annealer, which seems to provide a quantum speedup. Additionally, we benchmark small $n$ cases in a D-Wave machine, showing its capacity of devising bent functions, the most relevant set of cryptographic Boolean functions. We have complemented it with local search and chain repair to improve the D-Wave quantum annealer performance related to the low connectivity. This work shows how to codify super-exponential cryptographic problems into quantum annealers and paves the way for reaching quantum supremacy with an adequately designed chip.