Xiang Pan

LG
h-index98
27papers
1,753citations
Novelty48%
AI Score48

27 Papers

LGDec 12, 2022
Accelerating Dataset Distillation via Model Augmentation

Lei Zhang, Jie Zhang, Bowen Lei et al. · microsoft-research

Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones. Existing DD methods based on gradient matching achieve leading performance; however, they are extremely computationally intensive as they require continuously optimizing a dataset among thousands of randomly initialized models. In this paper, we assume that training the synthetic data with diverse models leads to better generalization performance. Thus we propose two model augmentation techniques, i.e. using early-stage models and parameter perturbation to learn an informative synthetic set with significantly reduced training cost. Extensive experiments demonstrate that our method achieves up to 20x speedup and comparable performance on par with state-of-the-art methods.

LGMar 26, 2022
A Roadmap for Big Model

Sha Yuan, Hanyu Zhao, Shuai Zhao et al. · bytedance, pku

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

SYSep 23, 2020
DeepOPF: Deep Neural Network for DC Optimal Power Flow

Xiang Pan, Tianyu Zhao, Minghua Chen

We develop DeepOPF as a Deep Neural Network (DNN) approach for solving direct current optimal power flow (DC-OPF) problems. DeepOPF is inspired by the observation that solving DC-OPF for a given power network is equivalent to characterizing a high-dimensional mapping between the load inputs and the dispatch and transmission decisions. We construct and train a DNN model to learn such mapping, then we apply it to obtain optimized operating decisions upon arbitrary load inputs. We adopt uniform sampling to address the over-fitting problem common in generic DNN approaches. We leverage on a useful structure in DC-OPF to significantly reduce the mapping dimension, subsequently cutting down the size of our DNN model and the amount of training data/time needed. We also design a post-processing procedure to ensure the feasibility of the obtained solution. Simulation results of IEEE test cases show that DeepOPF always generates feasible solutions with negligible optimality loss, while speeding up the computing time by two orders of magnitude as compared to conventional approaches implemented in a state-of-the-art solver.

LGJun 7, 2022
DeepOPF-AL: Augmented Learning for Solving AC-OPF Problems with Multiple Load-Solution Mappings

Xiang Pan, Wanjun Huang, Minghua Chen et al.

The existence of multiple load-solution mappings of non-convex AC-OPF problems poses a fundamental challenge to deep neural network (DNN) schemes. As the training dataset may contain a mixture of data points corresponding to different load-solution mappings, the DNN can fail to learn a legitimate mapping and generate inferior solutions. We propose DeepOPF-AL as an augmented-learning approach to tackle this issue. The idea is to train a DNN to learn a unique mapping from an augmented input, i.e., (load, initial point), to the solution generated by an iterative OPF solver with the load and initial point as intake. We then apply the learned augmented mapping to solve AC-OPF problems much faster than conventional solvers. Simulation results over IEEE test cases show that DeepOPF-AL achieves noticeably better optimality and similar feasibility and speedup performance, as compared to a recent DNN scheme, with the same DNN size yet elevated training complexity.

LGOct 4, 2023Code
PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting

Yujin Tang, Jiaming Zhou, Xiang Pan et al.

Accurate precipitation forecasting is a vital challenge of societal importance. Though data-driven approaches have emerged as a widely used solution, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. We focus on the Numerical Weather Prediction (NWP) post-processing based precipitation forecasting task to couple Machine Learning techniques with traditional NWP. This task remains challenging due to the imbalanced precipitation data and complex relationships between multiple meteorological variables. To address these limitations, we introduce the \textbf{PostRainBench}, a comprehensive multi-variable NWP post-processing benchmark, and \textbf{CAMT}, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3\%, 4.7\%, and 26.8\% in rain CSI and improvements of 15.6\%, 17.4\%, and 31.8\% over NWP predictions in heavy rain CSI on respective datasets. Most notably, our model is the first deep learning-based method to outperform NWP approaches in heavy rain conditions. These results highlight the potential impact of our model in reducing the severe consequences of extreme rainfall events. Our datasets and code are available at https://github.com/yyyujintang/PostRainBench.

QMMar 16Code
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening

Xiaoqing Lian, Pengsen Ma, Tengfeng Ma et al.

Motivation: The scalable identification of bioactive compounds is essential for contemporary drug discovery. This process faces a key trade-off: structural screening offers scalability but lacks biological context, whereas high-content phenotypic profiling provides deep biological insights but is resource-intensive. The primary challenge is to extract robust biological signals from noisy data and encode them into representations that do not require biological data at inference. Results: This study presents DECODE (DEcomposing Cellular Observations of Drug Effects), a framework that bridges this gap by empowering chemical representations with intrinsic biological semantics to enable structure-based in silico biological profiling. DECODE leverages limited paired transcriptomic and morphological data as supervisory signals during training, enabling the extraction of a measurement-invariant biological fingerprint from chemical structures and explicit filtering of experimental noise. Our evaluations demonstrate that DECODE retrieves functionally similar drugs in zero-shot settings with over 20% relative improvement over chemical baselines in mechanism-of-action (MOA) prediction. Furthermore, the framework achieves a 6-fold increase in hit rates for novel anti-cancer agents during external validation. Availability and implementation: The codes and datasets of DECODE are available at https://github.com/lian-xiao/DECODE.

CVJun 1, 2023
Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Yang Sui, Zhuohang Li, Ding Ding et al.

Learned Image Compression (LIC) has recently become the trending technique for image transmission due to its notable performance. Despite its popularity, the robustness of LIC with respect to the quality of image reconstruction remains under-explored. In this paper, we introduce an imperceptible attack approach designed to effectively degrade the reconstruction quality of LIC, resulting in the reconstructed image being severely disrupted by noise where any object in the reconstructed images is virtually impossible. More specifically, we generate adversarial examples by introducing a Frobenius norm-based loss function to maximize the discrepancy between original images and reconstructed adversarial examples. Further, leveraging the insensitivity of high-frequency components to human vision, we introduce Imperceptibility Constraint (IC) to ensure that the perturbations remain inconspicuous. Experiments conducted on the Kodak dataset using various LIC models demonstrate effectiveness. In addition, we provide several findings and suggestions for designing future defenses.

CLJun 14, 2022
Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Xiang Pan, Alex Sheng, David Shimshoni et al.

Pretrained language models have shown success in various areas of natural language processing, including reading comprehension tasks. However, when applying machine learning methods to new domains, labeled data may not always be available. To address this, we use supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks. We evaluate zero-shot performance on domain-specific reading comprehension tasks by combining task transfer with domain adaptation to fine-tune a pretrained model with no labelled data from the target task. Our approach outperforms Domain-Adaptive Pretraining on downstream domain-specific reading comprehension tasks in 3 out of 4 domains.

CLOct 25, 2022
Are All Spurious Features in Natural Language Alike? An Analysis through a Causal Lens

Nitish Joshi, Xiang Pan, He He

The term `spurious correlations' has been used in NLP to informally denote any undesirable feature-label correlations. However, a correlation can be undesirable because (i) the feature is irrelevant to the label (e.g. punctuation in a review), or (ii) the feature's effect on the label depends on the context (e.g. negation words in a review), which is ubiquitous in language tasks. In case (i), we want the model to be invariant to the feature, which is neither necessary nor sufficient for prediction. But in case (ii), even an ideal model (e.g. humans) must rely on the feature, since it is necessary (but not sufficient) for prediction. Therefore, a more fine-grained treatment of spurious features is needed to specify the desired model behavior. We formalize this distinction using a causal model and probabilities of necessity and sufficiency, which delineates the causal relations between a feature and a label. We then show that this distinction helps explain results of existing debiasing methods on different spurious features, and demystifies surprising results such as the encoding of spurious features in model representations after debiasing.

IVNov 29, 2023
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Yang Sui, Ding Ding, Xiang Pan et al.

In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored.

IVApr 17, 2024Code
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Xin Li, Kun Yuan, Yajing Pei et al.

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.

LGJul 8, 2024
Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Yijun Dong, Hoang Phan, Xiang Pan et al.

We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

LGOct 27, 2024Code
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Qi Zhang, Yifei Wang, Jingyi Cui et al.

Deep learning models often suffer from a lack of interpretability due to polysemanticity, where individual neurons are activated by multiple unrelated semantics, resulting in unclear attributions of model behavior. Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability but are commonly believed to compromise accuracy. In this work, we challenge the prevailing belief of the accuracy-interpretability tradeoff, showing that monosemantic features not only enhance interpretability but also bring concrete gains in model performance. Across multiple robust learning scenarios-including input and label noise, few-shot learning, and out-of-domain generalization-our results show that models leveraging monosemantic features significantly outperform those relying on polysemantic features. Furthermore, we provide empirical and theoretical understandings on the robustness gains of feature monosemanticity. Our preliminary analysis suggests that monosemanticity, by promoting better separation of feature representations, leads to more robust decision boundaries. This diverse evidence highlights the generality of monosemanticity in improving model robustness. As a first step in this new direction, we embark on exploring the learning benefits of monosemanticity beyond interpretability, supporting the long-standing hypothesis of linking interpretability and robustness. Code is available at \url{https://github.com/PKU-ML/Beyond_Interpretability}.

CLNov 16, 2021Code
DataCLUE: A Benchmark Suite for Data-centric NLP

Liang Xu, Jiacheng Liu, Xiang Pan et al.

Data-centric AI has recently proven to be more effective and high-performance, while traditional model-centric AI delivers fewer and fewer benefits. It emphasizes improving the quality of datasets to achieve better model performance. This field has significant potential because of its great practicability and getting more and more attention. However, we have not seen significant research progress in this field, especially in NLP. We propose DataCLUE, which is the first Data-Centric benchmark applied in NLP field. We also provide three simple but effective baselines to foster research in this field (improve Macro-F1 up to 5.7% point). In addition, we conduct comprehensive experiments with human annotators and show the hardness of DataCLUE. We also try an advanced method: the forgetting informed bootstrapping label correction method. All the resources related to DataCLUE, including datasets, toolkit, leaderboard, and baselines, is available online at https://github.com/CLUEbenchmark/DataCLUE

CVApr 24, 2024
AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman et al.

This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.

MLMar 11, 2024
Bridging Domains with Approximately Shared Features

Ziliang Samuel Zhong, Xiang Pan, Qi Lei

Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a statistical framework that distinguishes the utilities of features based on the variance of their correlation to label $y$ across domains. Under our framework, we design and analyze a learning procedure consisting of learning approximately shared feature representation from source tasks and fine-tuning it on the target task. Our theoretical analysis necessitates the importance of learning approximately shared features instead of only the strictly invariant features and yields an improved population risk compared to previous results on both source and target tasks, thus partly resolving the paradox mentioned above. Inspired by our theory, we proposed a more practical way to isolate the content (invariant+approximately shared) from environmental features and further consolidate our theoretical findings.

CVJun 15, 2025
LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling

Zhihan Zhang, Xiang Pan, Hongchen Wei et al.

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal strategies through iterative search processes, resulting in substantial computational overhead for on-demand MLLMs adaptation. To address this challenge, we propose LOP, an efficient neural pruning framework that learns optimal pruning strategies from the target pruning constraint, eliminating the need for computationally expensive search-based methods. LOP approach trains autoregressive neural networks (NNs) to directly predict layer-wise pruning strategies adaptive to the target pruning constraint, eliminating the time-consuming iterative searches. Experimental results across multiple tasks show that LOP outperforms state-of-the-art pruning methods in various metrics while achieving up to three orders of magnitude speedup.

LGJan 26, 2025
Inductive-Associative Meta-learning Pipeline with Human Cognitive Patterns for Unseen Drug-Target Interaction Prediction

Xiaoqing Lian, Jie Zhu, Tianxu Lv et al.

Significant differences in protein structures hinder the generalization of existing drug-target interaction (DTI) models, which often rely heavily on pre-learned binding principles or detailed annotations. In contrast, BioBridge designs an Inductive-Associative pipeline inspired by the workflow of scientists who base their accumulated expertise on drawing insights into novel drug-target pairs from weakly related references. BioBridge predicts novel drug-target interactions using limited sequence data, incorporating multi-level encoders with adversarial training to accumulate transferable binding principles. On these principles basis, BioBridge employs a dynamic prototype meta-learning framework to associate insights from weakly related annotations, enabling robust predictions for previously unseen drug-target pairs. Extensive experiments demonstrate that BioBridge surpasses existing models, especially for unseen proteins. Notably, when only homologous protein binding data is available, BioBridge proves effective for virtual screening of the epidermal growth factor receptor and adenosine receptor, underscoring its potential in drug discovery.

CVJan 6, 2024
Transferable Learned Image Compression-Resistant Adversarial Perturbations

Yang Sui, Zhuohang Li, Ding Ding et al.

Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.

LGDec 15, 2021
Ensuring DNN Solution Feasibility for Optimization Problems with Convex Constraints and Its Application to DC Optimal Power Flow Problems

Tianyu Zhao, Xiang Pan, Minghua Chen et al.

Ensuring solution feasibility is a key challenge in developing Deep Neural Network (DNN) schemes for solving constrained optimization problems, due to inherent DNN prediction errors. In this paper, we propose a ``preventive learning'' framework to guarantee DNN solution feasibility for problems with convex constraints and general objective functions without post-processing, upon satisfying a mild condition on constraint calibration. Without loss of generality, we focus on problems with only inequality constraints. We systematically calibrate inequality constraints used in DNN training, thereby anticipating prediction errors and ensuring the resulting solutions remain feasible. We characterize the calibration magnitudes and the DNN size sufficient for ensuring universal feasibility. We propose a new Adversarial-Sample Aware training algorithm to improve DNN's optimality performance without sacrificing feasibility guarantee. Overall, the framework provides two DNNs. The first one from characterizing the sufficient DNN size can guarantee universal feasibility while the other from the proposed training algorithm further improves optimality and maintains DNN's universal feasibility simultaneously. We apply the framework to develop DeepOPF+ for solving essential DC optimal power flow problems in grid operation. Simulation results over IEEE test cases show that it outperforms existing strong DNN baselines in ensuring 100% feasibility and attaining consistent optimality loss ($<$0.19%) and speedup (up to $\times$228) in both light-load and heavy-load regimes, as compared to a state-of-the-art solver. We also apply our framework to a non-convex problem and show its performance advantage over existing schemes.

CLNov 15, 2021
Calculating Question Similarity is Enough: A New Method for KBQA Tasks

Hanyu Zhao, Sha Yuan, Jiahong Leng et al.

Knowledge Base Question Answering (KBQA) aims to answer natural language questions with the help of an external knowledge base. The core idea is to find the link between the internal knowledge behind questions and known triples of the knowledge base. Traditional KBQA task pipelines contain several steps, including entity recognition, entity linking, answering selection, etc. In this kind of pipeline methods, errors in any procedure will inevitably propagate to the final prediction. To address this challenge, this paper proposes a Corpus Generation - Retrieve Method (CGRM) with Pre-training Language Model (PLM) for the KBQA task. The major novelty lies in the design of the new method, wherein our approach, the knowledge enhanced T5 (kT5) model aims to generate natural language QA pairs based on Knowledge Graph triples and directly solve the QA by retrieving the synthetic dataset. The new method can extract more information about the entities from PLM to improve accuracy and simplify the processes. We test our method on NLPCC-ICCPOL 2016 KBQA dataset, and the results show that our method improves the performance of KBQA and the out straight-forward method is competitive with the state-of-the-art.

CLJul 15, 2021
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

Liang Xu, Xiaojing Lu, Chenyang Yuan et al.

Pretrained Language Models (PLMs) have achieved tremendous success in natural language understanding tasks. While different learning schemes -- fine-tuning, zero-shot, and few-shot learning -- have been widely explored and compared for languages such as English, there is comparatively little work in Chinese to fairly and comprehensively evaluate and compare these methods and thus hinders cumulative progress. In this paper, we introduce the Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive few-shot evaluation benchmark in Chinese. It includes nine tasks, ranging from single-sentence and sentence-pair classification tasks to machine reading comprehension tasks. We systematically evaluate five state-of-the-art (SOTA) few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compare their performance with fine-tuning and zero-shot learning schemes on the newly constructed FewCLUE benchmark. Experimental results reveal that: 1) The effect of different few-shot learning methods is sensitive to the pre-trained model to which the methods are applied; 2) PET and P-tuning achieve the best overall performance with RoBERTa and ERNIE respectively. Our benchmark is used in the few-shot learning contest of NLPCC 2021. In addition, we provide a user-friendly toolkit, as well as an online leaderboard to help facilitate further progress on Chinese few-shot learning. We provide a baseline performance on different learning methods, a reference for future research.

SYMar 22, 2021
DeepOPF-V: Solving AC-OPF Problems Efficiently

Wanjun Huang, Xiang Pan, Minghua Chen et al.

AC optimal power flow (AC-OPF) problems need to be solved more frequently in the future to maintain stable and economic power system operation. To tackle this challenge, a deep neural network-based voltage-constrained approach (DeepOPF-V) is proposed to solve AC-OPF problems with high computational efficiency. Its unique design predicts voltages of all buses and then uses them to reconstruct the remaining variables without solving non-linear AC power flow equations. A fast post-processing process is developed to enforce the box constraints. The effectiveness of DeepOPF-V is validated by simulations on IEEE 118/300-bus systems and a 2000-bus test system. Compared with existing studies, DeepOPF-V achieves decent computation speedup up to four orders of magnitude and comparable performance in optimality gap and preserving the feasibility of the solution.

AIJul 9, 2020
Multi-Granularity Modularized Network for Abstract Visual Reasoning

Xiangru Tang, Haoyuan Wang, Xiang Pan et al.

Abstract visual reasoning connects mental abilities to the physical world, which is a crucial factor in cognitive development. Most toddlers display sensitivity to this skill, but it is not easy for machines. Aimed at it, we focus on the Raven Progressive Matrices Test, designed to measure cognitive reasoning. Recent work designed some black-boxes to solve it in an end-to-end fashion, but they are incredibly complicated and difficult to explain. Inspired by cognitive studies, we propose a Multi-Granularity Modularized Network (MMoN) to bridge the gap between the processing of raw sensory information and symbolic reasoning. Specifically, it learns modularized reasoning functions to model the semantic rule from the visual grounding in a neuro-symbolic and semi-supervision way. To comprehensively evaluate MMoN, our experiments are conducted on the dataset of both seen and unseen reasoning rules. The result shows that MMoN is well suited for abstract visual reasoning and also explainable on the generalization test.

CVJul 3, 2020
Improving auto-encoder novelty detection using channel attention and entropy minimization

Miao Tian, Dongyan Guo, Ying Cui et al.

Novelty detection is a important research area which mainly solves the classification problem of inliers which usually consists of normal samples and outliers composed of abnormal samples. Auto-encoder is often used for novelty detection. However, the generalization ability of the auto-encoder may cause the undesirable reconstruction of abnormal elements and reduce the identification ability of the model. To solve the problem, we focus on the perspective of better reconstructing the normal samples as well as retaining the unique information of normal samples to improve the performance of auto-encoder for novelty detection. Firstly, we introduce attention mechanism into the task. Under the action of attention mechanism, auto-encoder can pay more attention to the representation of inlier samples through adversarial training. Secondly, we apply the information entropy into the latent layer to make it sparse and constrain the expression of diversity. Experimental results on three public datasets show that the proposed method achieves comparable performance compared with previous popular approaches.

SYJul 2, 2020
DeepOPF: A Feasibility-Optimized Deep Neural Network Approach for AC Optimal Power Flow Problems

Xiang Pan, Minghua Chen, Tianyu Zhao et al.

High percentage penetrations of renewable energy generations introduce significant uncertainty into power systems. It requires grid operators to solve alternative current optimal power flow (AC-OPF) problems more frequently for economical and reliable operation in both transmission and distribution grids. In this paper, we develop a Deep Neural Network (DNN) approach, called DeepOPF, for solving AC-OPF problems in a fraction of the time used by conventional solvers. A key difficulty for applying machine learning techniques for solving AC-OPF problems lies in ensuring that the obtained solutions respect the equality and inequality physical and operational constraints. Generalized the 2-stage procedure in [1], [2], DeepOPF first trains a DNN model to predict a set of independent operating variables and then directly compute the remaining dependable ones by solving power flow equations. Such an approach not only preserves the power-flow balance equality constraints but also reduces the number of variables to predict by the DNN, cutting down the number of neurons and training data needed. DeepOPF then employs a penalty approach with a zero-order gradient estimation technique in the training process to preserve the remaining inequality constraints. As another contribution, we drive a condition for tuning the size of the DNN according to the desired approximation accuracy, which measures the DNN generalization capability. It provides theoretical justification for using DNN to solve the AC-OPF problem. Simulation results of IEEE 30/118/300-bus and a synthetic 2000-bus test cases show that DeepOPF speeds up the computing time by up to two orders of magnitude as compared to a state-of-the-art solver, at the expense of $<$0.1% cost difference.

SYOct 30, 2019
DeepOPF: A Deep Neural Network Approach for Security-Constrained DC Optimal Power Flow

Xiang Pan, Tianyu Zhao, Minghua Chen et al.

We develop DeepOPF as a Deep Neural Network (DNN) approach for solving security-constrained direct current optimal power flow (SC-DCOPF) problems, which are critical for reliable and cost-effective power system operation.DeepOPF is inspired by the observation that solving SC-DCOPF problems for a given power network is equivalent to depicting a high-dimensional mapping from the load inputs to the generation and phase angle outputs. We first train a DNN to learn the mapping and predict the generations from the load inputs. We then directly reconstruct the phase angles from the generations and loads by using the power flow equations. Such a predict-and-reconstruct approach reduces the dimension of the mapping to learn, subsequently cutting down the size of the DNN and the amount of training data needed. We further derive a condition for tuning the size of the DNN according to the desired approximation accuracy of the load-generation mapping. We develop a post-processing procedure based on $\ell_1$-projection to ensure the feasibility of the obtained solution, which can be of independent interest. Simulation results for IEEE test cases show that DeepOPF generates feasible solutions with less than 0.2% optimality loss, while speeding up the computation time by up to two orders of magnitude as compared to a state-of-the-art solver.