Weidong Sun

CV
h-index19
3papers
548citations
Novelty53%
AI Score37

3 Papers

CVSep 4, 2022
Single-source Domain Expansion Network for Cross-Scene Hyperspectral Image Classification

Yuxiang Zhang, Wei Li, Weidong Sun et al.

Currently, cross-scene hyperspectral image (HSI) classification has drawn increasing attention. It is necessary to train a model only on source domain (SD) and directly transferring the model to target domain (TD), when TD needs to be processed in real time and cannot be reused for training. Based on the idea of domain generalization, a Single-source Domain Expansion Network (SDEnet) is developed to ensure the reliability and effectiveness of domain extension. The method uses generative adversarial learning to train in SD and test in TD. A generator including semantic encoder and morph encoder is designed to generate the extended domain (ED) based on encoder-randomization-decoder architecture, where spatial and spectral randomization are specifically used to generate variable spatial and spectral information, and the morphological knowledge is implicitly applied as domain invariant information during domain expansion. Furthermore, the supervised contrastive learning is employed in the discriminator to learn class-wise domain invariant representation, which drives intra-class samples of SD and ED. Meanwhile, adversarial training is designed to optimize the generator to drive intra-class samples of SD and ED to be separated. Extensive experiments on two public HSI datasets and one additional multispectral image (MSI) dataset demonstrate the superiority of the proposed method when compared with state-of-the-art techniques.

ASApr 25, 2025Code
Kimi-Audio Technical Report

KimiTeam, Ding Ding, Zeqian Ju et al.

We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input and discrete tokens as output, and develop a chunk-wise streaming detokenizer based on flow matching. We curate a pre-training dataset that consists of more than 13 million hours of audio data covering a wide range of modalities including speech, sound, and music, and build a pipeline to construct high-quality and diverse post-training data. Initialized from a pre-trained LLM, Kimi-Audio is continual pre-trained on both audio and text data with several carefully designed tasks, and then fine-tuned to support a diverse of audio-related tasks. Extensive evaluation shows that Kimi-Audio achieves state-of-the-art performance on a range of audio benchmarks including speech recognition, audio understanding, audio question answering, and speech conversation. We release the codes, model checkpoints, as well as the evaluation toolkits in https://github.com/MoonshotAI/Kimi-Audio.

RONov 27, 2018
Fast UAV Trajectory Optimization using Bilevel Optimization with Analytical Gradients

Weidong Sun, Gao Tang, Kris Hauser

We present an efficient optimization framework that solves trajectory optimization problems by decoupling state variables from timing variables, thereby decomposing a challenging nonlinear programming (NLP) problem into two easier subproblems. With timing fixed, the state variables can be optimized efficiently using convex optimization, and the timing variables can be optimized in a separate NLP, which forms a bilevel optimization problem. The challenge of obtaining the gradient of the timing variables is solved by sensitivity analysis of parametric NLPs. The exact analytic gradient is computed from the dual solution as a by-product, whereas existing finite-difference techniques require additional optimization. The bilevel optimization framework efficiently optimizes both timing and state variables which is demonstrated on generating trajectories for an unmanned aerial vehicle. Numerical experiments demonstrate that bilevel optimization converges significantly more reliably than a standard NLP solver, and analytical gradients outperform finite differences in terms of computation speed and accuracy. Physical experiments demonstrate its real-time applicability for reactive target tracking tasks.