84.7LGJun 2
Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM PruningHu Xu, Zhaolong Xing, Congcong Liu et al.
Post-training pruning compresses large language models to high sparsity using a small unlabelled calibration set, and recent work has concluded that the choice of calibration source has only modest impact on averaged post-pruning accuracy. We ask whether this conclusion survives once calibration impact is evaluated separately across distinct capability dimensions rather than aggregated. Decomposing post-pruning capability into General, Commonsense, Code, and Math, and analysing $n{=}15$ calibration sources via Spearman correlations between OIT information metrics and per-dimension retention, we uncover an opposite-sign trade-off: calibration perplexity correlates positively with General retention ($ρ{=}{+}0.71$) but negatively with Math and Code retention ($ρ{=}{-}0.53,\,{-}0.59$; $p{<}0.05$), so no single source can preserve all capabilities. We respond with multi-source calibration mixing, and propose IGSP, an information-guided self-calibration protocol that automates multi-source construction without capability-aligned corpora by minimising 4-gram aggregation and balancing perplexity across dimensions. On LLaMA-3.1-8B at SparseGPT 60% sparsity, a uniform multi-source mix reaches 58.8% total retention, outperforming the best single source (MetaMath, 50.0%) by $+8.8$ and the C4 default (40.0%) by $+18.8$; IGSP improves over Self-Cal by $+2.4$ and SGS by $+4.8$.
ROSep 9, 2024Code
GOPT: Generalizable Online 3D Bin Packing via Transformer-based Deep Reinforcement LearningHeng Xiong, Changrong Guo, Jian Peng et al.
Robotic object packing has broad practical applications in the logistics and automation industry, often formulated by researchers as the online 3D Bin Packing Problem (3D-BPP). However, existing DRL-based methods primarily focus on enhancing performance in limited packing environments while neglecting the ability to generalize across multiple environments characterized by different bin dimensions. To this end, we propose GOPT, a generalizable online 3D Bin Packing approach via Transformer-based deep reinforcement learning (DRL). First, we design a Placement Generator module to yield finite subspaces as placement candidates and the representation of the bin. Second, we propose a Packing Transformer, which fuses the features of the items and bin, to identify the spatial correlation between the item to be packed and available sub-spaces within the bin. Coupling these two components enables GOPT's ability to perform inference on bins of varying dimensions. We conduct extensive experiments and demonstrate that GOPT not only achieves superior performance against the baselines, but also exhibits excellent generalization capabilities. Furthermore, the deployment with a robot showcases the practical applicability of our method in the real world. The source code will be publicly available at https://github.com/Xiong5Heng/GOPT.
CVJan 20Code
DiffFace-Edit: A Diffusion-Based Facial Dataset for Forgery-Semantic Driven Deepfake Detection AnalysisFeng Ding, Wenhui Yi, Xinan He et al.
Generative models now produce imperceptible, fine-grained manipulated faces, posing significant privacy risks. However, existing AI-generated face datasets generally lack focus on samples with fine-grained regional manipulations. Furthermore, no researchers have yet studied the real impact of splice attacks, which occur between real and manipulated samples, on detectors. We refer to these as detector-evasive samples. Based on this, we introduce the DiffFace-Edit dataset, which has the following advantages: 1) It contains over two million AI-generated fake images. 2) It features edits across eight facial regions (e.g., eyes, nose) and includes a richer variety of editing combinations, such as single-region and multi-region edits. Additionally, we specifically analyze the impact of detector-evasive samples on detection models. We conduct a comprehensive analysis of the dataset and propose a cross-domain evaluation that combines IMDL methods. Dataset will be available at https://github.com/ywh1093/DiffFace-Edit.
CVNov 22, 2024Code
FairAdapter: Detecting AI-generated Images with Improved FairnessFeng Ding, Jun Zhang, Xinan He et al.
The high-quality, realistic images generated by generative models pose significant challenges for exposing them.So far, data-driven deep neural networks have been justified as the most efficient forensics tools for the challenges. However, they may be over-fitted to certain semantics, resulting in considerable inconsistency in detection performance across different contents of generated samples. It could be regarded as an issue of detection fairness. In this paper, we propose a novel framework named Fairadapter to tackle the issue. In comparison with existing state-of-the-art methods, our model achieves improved fairness performance. Our project: https://github.com/AppleDogDog/FairnessDetection
CLNov 20, 2019Code
CAIL2019-SCM: A Dataset of Similar Case Matching in Legal DomainChaojun Xiao, Haoxi Zhong, Zhipeng Guo et al.
In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Similar Case Matching dataset. CAIL2019-SCM contains 8,964 triplets of cases published by the Supreme People's Court of China. CAIL2019-SCM focuses on detecting similar cases, and the participants are required to check which two cases are more similar in the triplets. There are 711 teams who participated in this year's competition, and the best team has reached a score of 71.88. We have also implemented several baselines to help researchers better understand this task. The dataset and more details can be found from https://github.com/china-ai-law-challenge/CAIL2019/tree/master/scm.
48.4ITApr 20
Rate-Distortion Theory for Deductive Sources under Closure FidelityJianfeng Xu
We study lossy compression of a finite statement source generated in a fixed deductive environment. The source symbols are statements in a knowledge base endowed with a proof system, and reconstruction fidelity is measured by preservation of deductive closure rather than by symbolwise equality. This induces, once the proof system and canonical order are fixed, a decomposition of the source into an irredundant core and redundant stored consequences. Under a natural disjointness condition on zero-distortion reconstruction sets, we show that the minimum zero-distortion rate equals the source mass of the core times the entropy of the source conditioned on that core. For reconstruction alphabets contained in the deductive closure of the source knowledge base, we further prove that the full rate-distortion function depends only on the core, so redundant states are invisible to both rate and distortion. When the decoder is limited to a bounded number of inference steps, we obtain an exact fixed depth rate-delay-distortion characterization. Under an additional order-robustness assumption identifying the chosen core with the order-free essential set, this characterization interpolates between classical symbolwise compression and unconstrained deductive compression. These results formulate deductive compression as a structured source coding problem and quantify how shared inference structure changes the fundamental limits of communication.
83.4ITApr 13
Semantic Rate-Distortion Theory: Deductive Compression and Closure FidelityJianfeng Xu
Shannon's rate-distortion theory treats source symbols as unstructured labels. When the source is a knowledge base equipped with a logical proof system, a natural fidelity criterion is closure fidelity: a reconstruction is acceptable if it preserves the deductive closure of the original. This paper develops a rate-distortion theory under this criterion. Central to the theory is the irredundant core-a canonical generating set extracted by a fixed-order deletion procedure, from which the full deductive closure can be rederived. We prove that the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states. More generally, the full semantic rate-distortion function depends only on the core; redundant states are invisible to both rate and distortion. We derive a semantic source-channel separation theorem showing a semantic leverage phenomenon: under closure fidelity, the required source rate is reduced by an asymptotic leverage factor greater than one, allowing the same knowledge base to be communicated with proportionally fewer channel uses-not by violating Shannon capacity, but because redundant states become free. We also prove a strengthened Fano inequality that exploits core structure. For heterogeneous multi-agent communication, an overlap decomposition gives necessary and sufficient conditions for closure-reliable transmission and identifies a semantic bottleneck in broadcast settings that persists even over noiseless channels. All results are verified on Datalog instances with up to 24,000 base facts.
78.3LOApr 10
Semantic Channel Theory: Deductive Compression and Structural Fidelity for Multi-Agent CommunicationJianfeng Xu
Shannon's information theory deliberately excludes message semantics. This paper develops a rigorous framework for semantic communication that integrates formal proof systems with Shannon-theoretic tools. We introduce an axiomatic information model comprising Lsem-definable state sets linked by computable enabling maps, and define the semantic channel as a composition of Markov kernels whose supports respect the enabling structure. A fixed proof system induces an irredundant semantic core and a derivation-depth stratification, enabling four distortion measures of increasing semantic depth: Hamming, closure, depth, and a parameterized composite. Six families of computable semantic channel invariants are defined and their inter-relationships established, including a data processing bound, a semantic Fano bound, and an ideal-channel collapse theorem. The central quantitative result is a deductive compression gain: under closure-based fidelity, the minimum block length is determined by the irredundant core size rather than the full knowledge-base size. We instantiate the framework for heterogeneous multi-agent communication, introducing an overlap decomposition that yields necessary and sufficient conditions for closure-reliable communication. A semantic bottleneck phenomenon is identified in broadcast settings: vocabulary mismatch imposes irreducible fidelity limitations even over noiseless carriers. All results are verified on an explicit Datalog instance.
LOMay 19, 2025
Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information MappingJianfeng Xu
This paper addresses the current lack of a unified formal framework in machine learning theory, as well as the absence of robust theoretical foundations for interpretability and ethical safety assurance. We first construct a formal information model, employing sets of well-formed formulas (WFFs) to explicitly define the ontological states and carrier mappings for the core components of machine learning. By introducing learnable and processable predicates, as well as learning and processing functions, we analyze the logical inference and constraint rules underlying causal chains in models, thereby establishing the Machine Learning Theory Meta-Framework (MLT-MF). Building upon this framework, we propose universal definitions for model interpretability and ethical safety, and rigorously prove and validate four key theorems: the equivalence between model interpretability and information existence, the constructive formulation of ethical safety assurance and two types of total variation distance (TVD) upper bounds. This work overcomes the limitations of previous fragmented approaches, providing a unified theoretical foundation from an information science perspective to systematically address the critical challenges currently facing machine learning.
LGJan 2, 2025
General Information Metrics for Improving AI Model Training EfficiencyJianfeng Xu, Congcong Liu, Xiaoying Tan et al.
To address the growing size of AI model training data and the lack of a universal data selection methodology-factors that significantly drive up training costs -- this paper presents the General Information Metrics Evaluation (GIME) method. GIME leverages general information metrics from Objective Information Theory (OIT), including volume, delay, scope, granularity, variety, duration, sampling rate, aggregation, coverage, distortion, and mismatch to optimize dataset selection for training purposes. Comprehensive experiments conducted across diverse domains, such as CTR Prediction, Civil Case Prediction, and Weather Forecasting, demonstrate that GIME effectively preserves model performance while substantially reducing both training time and costs. Additionally, applying GIME within the Judicial AI Program led to a remarkable 39.56% reduction in total model training expenses, underscoring its potential to support efficient and sustainable AI development.
ITNov 24, 2025
Information Physics of Intelligence: Unifying Logical Depth and Entropy under Thermodynamic ConstraintsJianfeng Xu, Zeyan Li
The rapid scaling of artificial intelligence models has revealed a fundamental tension between model capacity (storage) and inference efficiency (computation). While classical information theory focuses on transmission and storage limits, it lacks a unified physical framework to quantify the thermodynamic costs of generating information from compressed laws versus retrieving it from memory. In this paper, we propose a theoretical framework that treats information processing as an enabling mapping from ontological states to carrier states. We introduce a novel metric, Derivation Entropy, which quantifies the effective work required to compute a target state from a given logical depth. By analyzing the interplay between Shannon entropy (storage) and computational complexity (time/energy), we demonstrate the existence of a critical phase transition point. Below this threshold, memory retrieval is thermodynamically favorable; above it, generative computation becomes the optimal strategy. This "Energy-Time-Space" conservation law provides a physical explanation for the efficiency of generative models and offers a rigorous mathematical bound for designing next-generation, energy-efficient AI architectures. Our findings suggest that the minimization of Derivation Entropy is a governing principle for the evolution of both biological and artificial intelligence.
IVJun 1, 2020
Residual Squeeze-and-Excitation Network for Fast Image DerainingJun Fu, Jianfeng Xu, Kazuyuki Tasaka et al.
Image deraining is an important image processing task as rain streaks not only severely degrade the visual quality of images but also significantly affect the performance of high-level vision tasks. Traditional methods progressively remove rain streaks via different recurrent neural networks. However, these methods fail to yield plausible rain-free images in an efficient manner. In this paper, we propose a residual squeeze-and-excitation network called RSEN for fast image deraining as well as superior deraining performance compared with state-of-the-art approaches. Specifically, RSEN adopts a lightweight encoder-decoder architecture to conduct rain removal in one stage. Besides, both encoder and decoder adopt a novel residual squeeze-and-excitation block as the core of feature extraction, which contains a residual block for producing hierarchical features, followed by a squeeze-and-excitation block for channel-wisely enhancing the resulted hierarchical features. Experimental results demonstrate that our method can not only considerably reduce the computational complexity but also significantly improve the deraining performance compared with state-of-the-art methods.
CVMar 25, 2020
Prior-enlightened and Motion-robust Video DeblurringYa Zhou, Jianfeng Xu, Kazuyuki Tasaka et al.
Various blur distortions in video will cause negative impact on both human viewing and video-based applications, which makes motion-robust deblurring methods urgently needed. Most existing works have strong dataset dependency and limited generalization ability in handling challenging scenarios, like blur in low contrast or severe motion areas, and non-uniform blur. Therefore, we propose a PRiOr-enlightened and MOTION-robust video deblurring model (PROMOTION) suitable for challenging blurs. On the one hand, we use 3D group convolution to efficiently encode heterogeneous prior information, explicitly enhancing the scenes' perception while mitigating the output's artifacts. On the other hand, we design the priors representing blur distribution, to better handle non-uniform blur in spatio-temporal domain. Besides the classical camera shake caused global blurry, we also prove the generalization for the downstream task suffering from local blur. Extensive experiments demonstrate we can achieve the state-of-the-art performance on well-known REDS and GoPro datasets, and bring machine task gain.
CVMar 12, 2020
SASL: Saliency-Adaptive Sparsity Learning for Neural Network AccelerationJun Shi, Jianfeng Xu, Kazuyuki Tasaka et al.
Accelerating the inference speed of CNNs is critical to their deployment in real-world applications. Among all the pruning approaches, those implementing a sparsity learning framework have shown to be effective as they learn and prune the models in an end-to-end data-driven manner. However, these works impose the same sparsity regularization on all filters indiscriminately, which can hardly result in an optimal structure-sparse network. In this paper, we propose a Saliency-Adaptive Sparsity Learning (SASL) approach for further optimization. A novel and effective estimation of each filter, i.e., saliency, is designed, which is measured from two aspects: the importance for the prediction performance and the consumed computational resources. During sparsity learning, the regularization strength is adjusted according to the saliency, so our optimized format can better preserve the prediction performance while zeroing out more computation-heavy filters. The calculation for saliency introduces minimum overhead to the training process, which means our SASL is very efficient. During the pruning phase, in order to optimize the proposed data-dependent criterion, a hard sample mining strategy is utilized, which shows higher effectiveness and efficiency. Extensive experiments demonstrate the superior performance of our method. Notably, on ILSVRC-2012 dataset, our approach can reduce 49.7% FLOPs of ResNet-50 with very negligible 0.39% top-1 and 0.05% top-5 accuracy degradation.
AIOct 13, 2018
Overview of CAIL2018: Legal Judgment Prediction CompetitionHaoxi Zhong, Chaojun Xiao, Zhipeng Guo et al.
In this paper, we give an overview of the Legal Judgment Prediction (LJP) competition at Chinese AI and Law challenge (CAIL2018). This competition focuses on LJP which aims to predict the judgment results according to the given facts. Specifically, in CAIL2018 , we proposed three subtasks of LJP for the contestants, i.e., predicting relevant law articles, charges and prison terms given the fact descriptions. CAIL2018 has attracted several hundreds participants (601 teams, 1, 144 contestants from 269 organizations). In this paper, we provide a detailed overview of the task definition, related works, outstanding methods and competition results in CAIL2018.
CLJul 4, 2018
CAIL2018: A Large-Scale Legal Dataset for Judgment PredictionChaojun Xiao, Haoxi Zhong, Zhipeng Guo et al.
In this paper, we introduce the \textbf{C}hinese \textbf{AI} and \textbf{L}aw challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for judgment prediction. \dataset contains more than $2.6$ million criminal cases published by the Supreme People's Court of China, which are several times larger than other datasets in existing works on judgment prediction. Moreover, the annotations of judgment results are more detailed and rich. It consists of applicable law articles, charges, and prison terms, which are expected to be inferred according to the fact descriptions of cases. For comparison, we implement several conventional text classification baselines for judgment prediction and experimental results show that it is still a challenge for current models to predict the judgment results of legal cases, especially on prison terms. To help the researchers make improvements on legal judgment prediction, both \dataset and baselines will be released after the CAIL competition\footnote{http://cail.cipsc.org.cn/}.