Shen-Huan Lyu

LG
h-index13
14papers
51citations
Novelty55%
AI Score50

14 Papers

LGMar 4
On the Learnability of Offline Model-Based Optimization: A Ranking Perspective

Shen-Huan Lyu, Rong-Xi Tan, Ke Xue et al.

Offline model-based optimization (MBO) seeks to discover high-performing designs using only a fixed dataset of past evaluations. Most existing methods rely on learning a surrogate model via regression and implicitly assume that good predictive accuracy leads to good optimization performance. In this work, we challenge this assumption and study offline MBO from a learnability perspective. We argue that offline optimization is fundamentally a problem of ranking high-quality designs rather than accurate value prediction. Specifically, we introduce an optimization-oriented risk based on ranking between near-optimal and suboptimal designs, and develop a unified theoretical framework that connects surrogate learning to final optimization. We prove the theoretical advantages of ranking over regression, and identify distributional mismatch between the training data and near-optimal designs as the dominant error. Inspired by this, we design a distribution-aware ranking method to reduce this mismatch. Empirical results across various tasks show that our approach outperforms twenty existing methods, validating our theoretical findings. Additionally, both theoretical and empirical results reveal intrinsic limitations in offline MBO, showing a regime in which no offline method can avoid over-optimistic extrapolation.

LGAug 25, 2024
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning

Wenxuan Zhou, Zhihao Qu, Shen-Huan Lyu et al.

This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios where resource-constrained devices are involved in large-scale model training. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates and diminish the generalization capabilities of the resulting models. Our theoretical analysis provides insights into how compression errors critically hinder SL performance, which previous methodologies underestimate. To address these challenges, we employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity. Supported by rigorous theoretical analysis, our framework significantly reduces compression errors and accelerates the convergence. Extensive experiments also verify that our method outperforms existing solutions regarding training efficiency and communication complexity.

LGJul 6, 2024
The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest

Shen-Huan Lyu, Jin-Hui Wu, Qin-Cheng Zheng et al.

Random forests are classical ensemble algorithms that construct multiple randomized decision trees and aggregate their predictions using naive averaging. \citet{zhou2019deep} further propose a deep forest algorithm with multi-layer forests, which outperforms random forests in various tasks. The performance of deep forests is related to three hyperparameters in practice: depth, width, and tree size, but little has been known about its theoretical explanation. This work provides the first upper and lower bounds on the approximation complexity of deep forests concerning the three hyperparameters. Our results confirm the distinctive role of depth, which can exponentially enhance the expressiveness of deep forests compared with width and tree size. Experiments confirm the theoretical findings.

LGFeb 6
Enhance and Reuse: A Dual-Mechanism Approach to Boost Deep Forest for Label Distribution Learning

Jia-Le Xu, Shen-Huan Lyu, Yu-Nian Wang et al.

Label distribution learning (LDL) requires the learner to predict the degree of correlation between each sample and each label. To achieve this, a crucial task during learning is to leverage the correlation among labels. Deep Forest (DF) is a deep learning framework based on tree ensembles, whose training phase does not rely on backpropagation. DF performs in-model feature transform using the prediction of each layer and achieves competitive performance on many tasks. However, its exploration in the field of LDL is still in its infancy. The few existing methods that apply DF to the field of LDL do not have effective ways to utilize the correlation among labels. Therefore, we propose a method named Enhanced and Reused Feature Deep Forest (ERDF). It mainly contains two mechanisms: feature enhancement exploiting label correlation and measure-aware feature reuse. The first one is to utilize the correlation among labels to enhance the original features, enabling the samples to acquire more comprehensive information for the task of LDL. The second one performs a reuse operation on the features of samples that perform worse than the previous layer on the validation set, in order to ensure the stability of the training process. This kind of Enhance-Reuse pattern not only enables samples to enrich their features but also validates the effectiveness of their new features and conducts a reuse process to prevent the noise from spreading further. Experiments show that our method outperforms other comparison algorithms on six evaluation metrics.

NEMar 8
CDEoH: Category-Driven Automatic Algorithm Design With Large Language Models

Yu-Nian Wang, Shen-Huan Lyu, Ning Chen et al.

With the rapid advancement of large language models (LLMs), LLM-based heuristic search methods have demonstrated strong capabilities in automated algorithm generation. However, their evolutionary processes often suffer from instability and premature convergence. Existing approaches mainly address this issue through prompt engineering or by jointly evolving thought and code, while largely overlooking the critical role of algorithmic category diversity in maintaining evolutionary stability. To this end, we propose Category Driven Automatic Algorithm Design with Large Language Models (CDEoH), which explicitly models algorithm categories and jointly balances performance and category diversity in population management, enabling parallel exploration across multiple algorithmic paradigms. Extensive experiments on representative combinatorial optimization problems across multiple scales demonstrate that CDEoH effectively mitigates convergence toward a single evolutionary direction, significantly enhancing evolutionary stability and achieving consistently superior average performance across tasks and scales.

LGMar 3
Breaking the Prototype Bias Loop: Confidence-Aware Federated Contrastive Learning for Highly Imbalanced Clients

Tian-Shuang Wu, Shen-Huan Lyu, Ning Chen et al.

Local class imbalance and data heterogeneity across clients often trap prototype-based federated contrastive learning in a prototype bias loop: biased local prototypes induced by imbalanced data are aggregated into biased global prototypes, which are repeatedly reused as contrastive anchors, accumulating errors across communication rounds. To break this loop, we propose Confidence-Aware Federated Contrastive Learning (CAFedCL), a novel framework that improves the prototype aggregation mechanism and strengthens the contrastive alignment guided by prototypes. CAFedCL employs a confidence-aware aggregation mechanism that leverages predictive uncertainty to downweight high-variance local prototypes. In addition, generative augmentation for minority classes and geometric consistency regularization are integrated to stabilize the structure between classes. From a theoretical perspective, we provide an expectation-based analysis showing that our aggregation reduces estimation variance, thereby bounding global prototype drift and ensuring convergence. Extensive experiments under varying levels of class imbalance and data heterogeneity demonstrate that CAFedCL consistently outperforms representative federated baselines in both accuracy and client fairness.

LGOct 15, 2024
Offline Model-Based Optimization by Learning to Rank

Rong-Xi Tan, Ke Xue, Shen-Huan Lyu et al.

Offline model-based optimization (MBO) aims to identify a design that maximizes a black-box function using only a fixed, pre-collected dataset of designs and their corresponding scores. A common approach in offline MBO is to train a regression-based surrogate model by minimizing mean squared error (MSE) and then find the best design within this surrogate model by different optimizers (e.g., gradient ascent). However, a critical challenge is the risk of out-of-distribution errors, i.e., the surrogate model may typically overestimate the scores and mislead the optimizers into suboptimal regions. Prior works have attempted to address this issue in various ways, such as using regularization techniques and ensemble learning to enhance the robustness of the model, but it still remains. In this paper, we argue that regression models trained with MSE are not well-aligned with the primary goal of offline MBO, which is to select promising designs rather than to predict their scores precisely. Notably, if a surrogate model can maintain the order of candidate designs based on their relative score relationships, it can produce the best designs even without precise predictions. To validate it, we conduct experiments to compare the relationship between the quality of the final designs and MSE, finding that the correlation is really very weak. In contrast, a metric that measures order-maintaining quality shows a significantly stronger correlation. Based on this observation, we propose learning a ranking-based model that leverages learning to rank techniques to prioritize promising designs based on their relative scores. We show that the generalization error on ranking loss can be well bounded. Empirical results across diverse tasks demonstrate the superior performance of our proposed ranking-based models than twenty existing methods.

LGJan 31, 2025
Improving Multi-Label Contrastive Learning by Leveraging Label Distribution

Ning Chen, Shen-Huan Lyu, Tian-Shuang Wu et al.

In multi-label learning, leveraging contrastive learning to learn better representations faces a key challenge: selecting positive and negative samples and effectively utilizing label information. Previous studies selected positive and negative samples based on the overlap between labels and used them for label-wise loss balancing. However, these methods suffer from a complex selection process and fail to account for the varying importance of different labels. To address these problems, we propose a novel method that improves multi-label contrastive learning through label distribution. Specifically, when selecting positive and negative samples, we only need to consider whether there is an intersection between labels. To model the relationships between labels, we introduce two methods to recover label distributions from logical labels, based on Radial Basis Function (RBF) and contrastive loss, respectively. We evaluate our method on nine widely used multi-label datasets, including image and vector datasets. The results demonstrate that our method outperforms state-of-the-art methods in six evaluation metrics.

LGMay 19, 2025
Theoretical Investigation on Inductive Bias of Isolation Forest

Qin-Cheng Zheng, Shao-Qun Zhang, Shen-Huan Lyu et al.

Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper focuses on the inductive bias of iForest, which theoretically elucidates under what circumstances and to what extent iForest works well. The key is to formulate the growth process of iForest, where the split dimensions and split values are randomly selected. We model the growth process of iForest as a random walk, enabling us to derive the expected depth function, which is the outcome of iForest, using transition probabilities. The case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor. Our study provides a theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.

LGFeb 9, 2025
Compressing Model with Few Class-Imbalance Samples: An Out-of-Distribution Expedition

Tian-Shuang Wu, Shen-Huan Lyu, Ning Chen et al.

In recent years, as a compromise between privacy and performance, few-sample model compression has been widely adopted to deal with limited data resulting from privacy and security concerns. However, when the number of available samples is extremely limited, class imbalance becomes a common and tricky problem. Achieving an equal number of samples across all classes is often costly and impractical in real-world applications, and previous studies on few-sample model compression have mostly ignored this significant issue. Our experiments comprehensively demonstrate that class imbalance negatively affects the overall performance of few-sample model compression methods. To address this problem, we propose a novel and adaptive framework named OOD-Enhanced Few-Sample Model Compression (OE-FSMC). This framework integrates easily accessible out-of-distribution (OOD) data into both the compression and fine-tuning processes, effectively rebalancing the training distribution. We also incorporate a joint distillation loss and a regularization term to reduce the risk of the model overfitting to the OOD data. Extensive experiments on multiple benchmark datasets show that our framework can be seamlessly incorporated into existing few-sample model compression methods, effectively mitigating the accuracy degradation caused by class imbalance.

LGFeb 1, 2025
Enhance Learning Efficiency of Oblique Decision Tree via Feature Concatenation

Shen-Huan Lyu, Yi-Xiao He, Yanyan Wang et al.

Oblique Decision Tree (ODT) separates the feature space by linear projections, as opposed to the conventional Decision Tree (DT) that forces axis-parallel splits. ODT has been proven to have a stronger representation ability than DT, as it provides a way to create shallower tree structures while still approximating complex decision boundaries. However, its learning efficiency is still insufficient, since the linear projections cannot be transmitted to the child nodes, resulting in a waste of model parameters. In this work, we propose an enhanced ODT method with Feature Concatenation (\texttt{FC-ODT}), which enables in-model feature transformation to transmit the projections along the decision paths. Theoretically, we prove that our method enjoys a faster consistency rate w.r.t. the tree depth, indicating that our method possesses a significant advantage in generalization performance, especially for shallow trees. Experiments show that \texttt{FC-ODT} can outperform the other state-of-the-art decision trees with a limited tree depth.

LGJun 7, 2024
Confidence-aware Contrastive Learning for Selective Classification

Yu-Chang Wu, Shen-Huan Lyu, Haopu Shang et al.

Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generalization bound for selective classification, disclosing that optimizing feature layers helps improve the performance of selective classification. Inspired by this theory, we propose to explicitly improve the selective classification model at the feature level for the first time, leading to a novel Confidence-aware Contrastive Learning method for Selective Classification, CCL-SC, which similarizes the features of homogeneous instances and differentiates the features of heterogeneous instances, with the strength controlled by the model's confidence. The experimental results on typical datasets, i.e., CIFAR-10, CIFAR-100, CelebA, and ImageNet, show that CCL-SC achieves significantly lower selective risk than state-of-the-art methods, across almost all coverage degrees. Moreover, it can be combined with existing methods to bring further improvement.

LGMay 1, 2023
Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

Yi-Xiao He, Shen-Huan Lyu, Yuan Jiang

Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explanatory tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real world data verify the effectiveness of our methods.

LGDec 27, 2018
Improving Generalization of Deep Neural Networks by Leveraging Margin Distribution

Shen-Huan Lyu, Lu Wang, Zhi-Hua Zhou

Recent research has used margin theory to analyze the generalization performance for deep neural networks (DNNs). The existed results are almost based on the spectrally-normalized minimum margin. However, optimizing the minimum margin ignores a mass of information about the entire margin distribution, which is crucial to generalization performance. In this paper, we prove a generalization upper bound dominated by the statistics of the entire margin distribution. Compared with the minimum margin bounds, our bound highlights an important measure for controlling the complexity, which is the ratio of the margin standard deviation to the expected margin. We utilize a convex margin distribution loss function on the deep neural networks to validate our theoretical results by optimizing the margin ratio. Experiments and visualizations confirm the effectiveness of our approach and the correlation between generalization gap and margin ratio.