SYFeb 4, 2016
Gramian-based reachability metrics for bilinear networksYingbo Zhao, Jorge Cortés
This paper studies Gramian-based reachability metrics for bilinear control systems. In the context of complex networks, bilinear systems capture scenarios where an actuator not only can affect the state of a node but also interconnections among nodes. Under the assumption that the input's infinity norm is bounded by some function of the network dynamic matrices, we derive a Gramian-based lower bound on the minimum input energy required to steer the state from the origin to any reachable target state. This result motivates our study of various objects associated to the reachability Gramian to quantify the ease of controllability of the bilinear network: the minimum eigenvalue (worst-case minimum input energy to reach a state), the trace (average minimum input energy to reach a state), and its determinant (volume of the ellipsoid containing the reachable states using control inputs with no more than unit energy). We establish an increasing returns property of the reachability Gramian as a function of the actuators, which in turn allows us to derive a general lower bound on the reachability metrics in terms of the aggregate contribution of the individual actuators. We conclude by examining the effect on the worst-case minimum input energy of the addition of bilinear inputs to difficult-to-control linear symmetric networks. We show that the bilinear networks resulting from the addition of either inputs at a finite number of interconnections or at all self loops with weight vanishing with the network scale remain difficult-to-control. Various examples illustrate our results.
CLDec 29, 2025Code
MiMo-Audio: Audio Language Models are Few-Shot LearnersXiaomi LLM-Core Team, Dong Zhang, Gang Wang et al.
Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.
CVNov 20, 2025
Degradation-Aware Hierarchical Termination for Blind Quality Enhancement of Compressed VideoLi Yu, Yingbo Zhao, Shiyu Wu et al.
Existing studies on Quality Enhancement for Compressed Video (QECV) predominantly rely on known Quantization Parameters (QPs), employing distinct enhancement models per QP setting, termed non-blind methods. However, in real-world scenarios involving transcoding or transmission, QPs may be partially or entirely unknown, limiting the applicability of such approaches and motivating the development of blind QECV techniques. Current blind methods generate degradation vectors via classification models with cross-entropy loss, using them as channel attention to guide artifact removal. However, these vectors capture only global degradation information and lack spatial details, hindering adaptation to varying artifact patterns at different spatial positions. To address these limitations, we propose a pretrained Degradation Representation Learning (DRL) module that decouples and extracts high-dimensional, multiscale degradation representations from video content to guide the artifact removal. Additionally, both blind and non-blind methods typically employ uniform architectures across QPs, hence, overlooking the varying computational demands inherent to different compression levels. We thus introduce a hierarchical termination mechanism that dynamically adjusts the number of artifact reduction stages based on the compression level. Experimental results demonstrate that the proposed approach significantly enhances performance, achieving a PSNR improvement of 110% (from 0.31 dB to 0.65 dB) over a competing state-of-the-art blind method at QP = 22. Furthermore, the proposed hierarchical termination mechanism reduces the average inference time at QP = 22 by half compared to QP = 42.
SYSep 7, 2017
Network Identification with Latent Nodes via Auto-Regressive ModelsErfan Nozari, Yingbo Zhao, Jorge Cortés
We consider linear time-invariant networks with unknown topology where only a manifest subset of the nodes can be directly actuated and measured while the state of the remaining latent nodes and their number are unknown. Our goal is to identify the transfer function of the manifest subnetwork and determine whether interactions between manifest nodes are direct or mediated by latent nodes. We show that, if there are no inputs to the latent nodes, the manifest transfer function can be approximated arbitrarily well in the H-infinity norm sense by the transfer function of an auto-regressive model and present a least-squares estimation method to construct the auto-regressive model from measured data. We show that the least-squares auto-regressive method guarantees an arbitrarily small H-infinity norm error in the approximation of the manifest transfer function, exponentially decaying once the model order exceeds a certain threshold. Finally, we show that when the latent subnetwork is acyclic, the proposed method achieves perfect identification of the manifest transfer function above a specific model order as the length of the data increases. Various examples illustrate our results.