Yi Qu

CV
h-index27
6papers
128citations
Novelty48%
AI Score43

6 Papers

CVAug 15, 2023Code
ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition

Yixuan Zhou, Yi Qu, Xing Xu et al.

Class imbalance is a common challenge in real-world recognition tasks, where the majority of classes have few samples, also known as tail classes. We address this challenge with the perspective of generalization and empirically find that the promising Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting. Through investigating this specific type of task, we identify that its generalization bottleneck primarily lies in the severe overfitting for tail classes with limited training data. To overcome this bottleneck, we leverage class priors to restrict the generalization scope of the class-agnostic SAM and propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM). With the guidance of class priors, our ImbSAM specifically improves generalization targeting tail classes. We also verify the efficacy of ImbSAM on two prototypical applications of class-imbalanced recognition: long-tailed classification and semi-supervised anomaly detection, where our ImbSAM demonstrates remarkable performance improvements for tail classes and anomaly. Our code implementation is available at https://github.com/cool-xuan/Imbalanced_SAM.

CVNov 26, 2023Code
BatchNorm-based Weakly Supervised Video Anomaly Detection

Yixuan Zhou, Yi Qu, Xing Xu et al.

In weakly supervised video anomaly detection (WVAD), where only video-level labels indicating the presence or absence of abnormal events are available, the primary challenge arises from the inherent ambiguity in temporal annotations of abnormal occurrences. Inspired by the statistical insight that temporal features of abnormal events often exhibit outlier characteristics, we propose a novel method, BN-WVAD, which incorporates BatchNorm into WVAD. In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos. The proposed DFM criterion is also discriminative for anomaly recognition and more resilient to label noise, serving as the additional anomaly score to amend the prediction of the anomaly classifier that is susceptible to noisy labels. Moreover, a batch-level selection strategy is devised to filter more abnormal snippets in videos where more abnormal events occur. The proposed BN-WVAD model demonstrates state-of-the-art performance on UCF-Crime with an AUC of 87.24%, and XD-Violence, where AP reaches up to 84.93%. Our code implementation is accessible at https://github.com/cool-xuan/BN-WVAD.

AIDec 17, 2025
Evaluating Large Language Models in Scientific Discovery

Zhangde Song, Jieyu Lu, Yuanqi Du et al.

Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define research projects of genuine interest and decompose them into modular research scenarios from which vetted questions are sampled. The framework assesses models at two levels: (i) question-level accuracy on scenario-tied items and (ii) project-level performance, where models must propose testable hypotheses, design simulations or experiments, and interpret results. Applying this two-phase scientific discovery evaluation (SDE) framework to state-of-the-art LLMs reveals a consistent performance gap relative to general science benchmarks, diminishing return of scaling up model sizes and reasoning, and systematic weaknesses shared across top-tier models from different providers. Large performance variation in research scenarios leads to changing choices of the best performing model on scientific discovery projects evaluated, suggesting all current LLMs are distant to general scientific "superintelligence". Nevertheless, LLMs already demonstrate promise in a great variety of scientific discovery projects, including cases where constituent scenario scores are low, highlighting the role of guided exploration and serendipity in discovery. This SDE framework offers a reproducible benchmark for discovery-relevant evaluation of LLMs and charts practical paths to advance their development toward scientific discovery.

LGMay 30, 2023Code
AnoOnly: Semi-Supervised Anomaly Detection with the Only Loss on Anomalies

Yixuan Zhou, Peiyu Yang, Yi Qu et al.

Semi-supervised anomaly detection (SSAD) methods have demonstrated their effectiveness in enhancing unsupervised anomaly detection (UAD) by leveraging few-shot but instructive abnormal instances. However, the dominance of homogeneous normal data over anomalies biases the SSAD models against effectively perceiving anomalies. To address this issue and achieve balanced supervision between heavily imbalanced normal and abnormal data, we develop a novel framework called AnoOnly (Anomaly Only). Unlike existing SSAD methods that resort to strict loss supervision, AnoOnly suspends it and introduces a form of weak supervision for normal data. This weak supervision is instantiated through the utilization of batch normalization, which implicitly performs cluster learning on normal data. When integrated into existing SSAD methods, the proposed AnoOnly demonstrates remarkable performance enhancements across various models and datasets, achieving new state-of-the-art performance. Additionally, our AnoOnly is natively robust to label noise when suffering from data contamination. Our code is publicly available at https://github.com/cool-xuan/AnoOnly.

CVJan 4, 2024
A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

Rui Ma, Qiang Zhou, Yizhu Jin et al.

Copyright law confers upon creators the exclusive rights to reproduce, distribute, and monetize their creative works. However, recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. These technologies enable the unauthorized learning and replication of copyrighted content, artistic creations, and likenesses, leading to the proliferation of unregulated content. Notably, models like stable diffusion, which excel in text-to-image synthesis, heighten the risk of copyright infringement and unauthorized distribution.Machine unlearning, which seeks to eradicate the influence of specific data or concepts from machine learning models, emerges as a promising solution by eliminating the \enquote{copyright memories} ingrained in diffusion models. Yet, the absence of comprehensive large-scale datasets and standardized benchmarks for evaluating the efficacy of unlearning techniques in the copyright protection scenarios impedes the development of more effective unlearning methods. To address this gap, we introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset. This dataset encompasses anchor images, associated prompts, and images synthesized by text-to-image models. Additionally, we have developed a mixed metric based on semantic and style information, validated through both human and artist assessments, to gauge the effectiveness of unlearning approaches. Our dataset, benchmark library, and evaluation metrics will be made publicly available to foster future research and practical applications (https://rmpku.github.io/CPDM-page/, website / http://149.104.22.83/unlearning.tar.gz, dataset).

CHEM-PHMay 13, 2025
Building-Block Aware Generative Modeling for 3D Crystals of Metal Organic Frameworks

Chenru Duan, Aditya Nandy, Sizhan Liu et al.

Metal-organic frameworks (MOFs) marry inorganic nodes, organic edges, and topological nets into programmable porous crystals, yet their astronomical design space defies brute-force synthesis. Generative modeling holds ultimate promise, but existing models either recycle known building blocks or are restricted to small unit cells. We introduce Building-Block-Aware MOF Diffusion (BBA MOF Diffusion), an SE(3)-equivariant diffusion model that learns 3D all-atom representations of individual building blocks, encoding crystallographic topological nets explicitly. Trained on the CoRE-MOF database, BBA MOF Diffusion readily samples MOFs with unit cells containing 1000 atoms with great geometric validity, novelty, and diversity mirroring experimental databases. Its native building-block representation produces unprecedented metal nodes and organic edges, expanding accessible chemical space by orders of magnitude. One high-scoring [Zn(1,4-TDC)(EtOH)2] MOF predicted by the model was synthesized, where powder X-ray diffraction, thermogravimetric analysis, and N2 sorption confirm its structural fidelity. BBA-Diff thus furnishes a practical pathway to synthesizable and high-performing MOFs.