MMOct 31, 2025Code
LongCat-Flash-Omni Technical ReportMeituan LongCat Team, Bairui Wang, Bayan et al.
We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction. By adopting a curriculum-inspired progressive training strategy that transitions from simpler to increasingly complex modality sequence modeling tasks, LongCat-Flash-Omni attains comprehensive multimodal capabilities while maintaining strong unimodal capability. Building upon LongCat-Flash, which adopts a high-performance Shortcut-connected Mixture-of-Experts (MoE) architecture with zero-computation experts, LongCat-Flash-Omni integrates efficient multimodal perception and speech reconstruction modules. Despite its immense size of 560B parameters (with 27B activated), LongCat-Flash-Omni achieves low-latency real-time audio-visual interaction. For training infrastructure, we developed a modality-decoupled parallelism scheme specifically designed to manage the data and model heterogeneity inherent in large-scale multimodal training. This innovative approach demonstrates exceptional efficiency by sustaining over 90% of the throughput achieved by text-only training. Extensive evaluations show that LongCat-Flash-Omni achieves state-of-the-art performance on omni-modal benchmarks among open-source models. Furthermore, it delivers highly competitive results across a wide range of modality-specific tasks, including text, image, and video understanding, as well as audio understanding and generation. We provide a comprehensive overview of the model architecture design, training procedures, and data strategies, and open-source the model to foster future research and development in the community.
CVJan 14, 2023
Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancyErik Isai Valle Salgado, Haoxin Yan, Yue Hong et al.
Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today's smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate the lack of samples. This research applies model-based TL via domain similarity to improve the overall performance and data augmentation in both target and source domains to enrich the data quality and reduce the imbalance. Given a group of source datasets from similar industrial processes, we define which group is the most related to the target through the domain discrepancy score and the number of samples each has. Then, we transfer the chosen pre-trained backbone weights to train and fine-tune the target network. Our research suggests increases in the F1 score and the PR curve up to 20% compared with TL using benchmark datasets.
LGMar 25, 2024
Enhancing Industrial Transfer Learning with Style Filter: Cost Reduction and Defect-FocusChen Li, Ruijie Ma, Xiang Qian et al.
Addressing the challenge of data scarcity in industrial domains, transfer learning emerges as a pivotal paradigm. This work introduces Style Filter, a tailored methodology for industrial contexts. By selectively filtering source domain data before knowledge transfer, Style Filter reduces the quantity of data while maintaining or even enhancing the performance of transfer learning strategy. Offering label-free operation, minimal reliance on prior knowledge, independence from specific models, and re-utilization, Style Filter is evaluated on authentic industrial datasets, highlighting its effectiveness when employed before conventional transfer strategies in the deep learning domain. The results underscore the effectiveness of Style Filter in real-world industrial applications.
ARSep 8, 2021
IceClave: A Trusted Execution Environment for In-Storage ComputingLuyi Kang, Yuqi Xue, Weiwei Jia et al.
In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviate the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them treat the in-storage security as the first citizen. Specifically, since modern SSD controllers do not have a trusted execution environment, an offloaded (malicious) program could steal, modify, and even destroy the data stored in the SSD. In this paper, we first investigate the attacks that could be conducted by offloaded in-storage programs. To defend against these attacks, we build a lightweight trusted execution environment, named IceClave for in-storage computing. IceClave enables security isolation between in-storage programs and flash management functions that include flash address translation, data access control, and garbage collection, with TrustZone extensions. IceClave also achieves security isolation between in-storage programs by enforcing memory integrity verification of in-storage DRAM with low overhead. To protect data loaded from flash chips, IceClave develops a lightweight data encryption/decryption mechanism in flash controllers. We develop IceClave with a full system simulator. We evaluate IceClave with a variety of data-intensive applications such as databases. Compared to state-of-the-art in-storage computing approaches, IceClave introduces only 7.6% performance overhead, while enforcing security isolation in the SSD controller with minimal hardware cost. IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.31$\times$ better performance than the conventional host-based trusted computing approach.
CVApr 24, 2019
Defocused images removal of axial overlapping scattering particles by using three-dimensional nonlinear diffusion based on digital holographyWei-Na Li, Zhengyun Zhang, Jianshe Ma et al.
We propose a three-dimensional nonlinear diffusion method to implement the similar autofocusing function of multiple micro-objects and simultaneously remove the defocused images, which can distinguish the locations of certain sized scattering particles that are overlapping along z-axis. It is applied to all of the reconstruction slices that are generated from the captured hologram after each back propagation. For certain small sized particles, the maxima of maximum gradient magnitude of each reconstruction slice appears at the ground truth z position after applying the proposed scheme when the reconstruction range along z-axis is sufficiently long and the reconstruction depth spacing is sufficiently fine. Therefore, the reconstructed image at ground truth z position is remained, while the defocused images are diffused out. The results demonstrated that the proposed scheme can diffuse out the defocused images which are 20 um away from the ground truth z position in spite of that several scattering particles with different diameters are completely overlapping along z-axis with a distance of 800 um when the hologram pixel pitch is 2 um. It also demonstrated that the sparsity distribution of the ground truth z slice cannot be affected by the sparsity distribution of corresponding defocused images when the diameter of the particle is not more than 35um and the reconstruction depth spacing is not less than 20 um.