Yixin Fang

ME
h-index6
9papers
137citations
Novelty47%
AI Score39

9 Papers

OCJul 20, 2022
Solving the Batch Stochastic Bin Packing Problem in Cloud: A Chance-constrained Optimization Approach

Jie Yan, Yunlei Lu, Liting Chen et al.

This paper investigates a critical resource allocation problem in the first party cloud: scheduling containers to machines. There are tens of services and each service runs a set of homogeneous containers with dynamic resource usage; containers of a service are scheduled daily in a batch fashion. This problem can be naturally formulated as Stochastic Bin Packing Problem (SBPP). However, traditional SBPP research often focuses on cases of empty machines, whose objective, i.e., to minimize the number of used machines, is not well-defined for the more common reality with nonempty machines. This paper aims to close this gap. First, we define a new objective metric, Used Capacity at Confidence (UCaC), which measures the maximum used resources at a probability and is proved to be consistent for both empty and nonempty machines, and reformulate the SBPP under chance constraints. Second, by modeling the container resource usage distribution in a generative approach, we reveal that UCaC can be approximated with Gaussian, which is verified by trace data of real-world applications. Third, we propose an exact solver by solving the equivalent cutting stock variant as well as two heuristics-based solvers -- UCaC best fit, bi-level heuristics. We experimentally evaluate these solvers on both synthetic datasets and real application traces, demonstrating our methodology's advantage over traditional SBPP optimal solver minimizing the number of used machines, with a low rate of resource violations.

CVNov 27, 2024
SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images

Yanyan Li, Yixin Fang, Federico Tombari et al.

Sparse Multi-view Images can be Learned to predict explicit radiance fields via Generalizable Gaussian Splatting approaches, which can achieve wider application prospects in real-life when ground-truth camera parameters are not required as inputs. In this paper, a novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images. First, Gaussian surfels are predicted based on the multi-head Gaussian regression decoder, which can are represented with less degree-of-freedom but have better multi-view consistency. Furthermore, the normal vectors of Gaussian surfel are enhanced based on high-quality of normal priors. Second, the Gaussians and camera parameters (both extrinsic and intrinsic) are optimized to obtain high-quality Gaussian radiance fields for novel view synthesis tasks based on the proposed Bundle-Adjusting Gaussian Splatting module. Extensive experiments on novel view rendering and depth map prediction tasks are conducted on public datasets, demonstrating that the proposed method achieves state-of-the-art performance in various 3D vision tasks. More information can be found on our project page (https://yanyan-li.github.io/project/gs/smilesplat)

MLOct 20, 2025
Arbitrated Indirect Treatment Comparisons

Yixin Fang, Weili He

Matching-adjusted indirect comparison (MAIC) has been increasingly employed in health technology assessments (HTA). By reweighting subjects from a trial with individual participant data (IPD) to match the covariate summary statistics of another trial with only aggregate data (AgD), MAIC facilitates the estimation of a treatment effect defined with respect to the AgD trial population. This manuscript introduces a new class of methods, termed arbitrated indirect treatment comparisons, designed to address the ``MAIC paradox'' -- a phenomenon highlighted by Jiang et al.~(2025). The MAIC paradox arises when different sponsors, analyzing the same data, reach conflicting conclusions regarding which treatment is more effective. The underlying issue is that each sponsor implicitly targets a different population. To resolve this inconsistency, the proposed methods focus on estimating treatment effects in a common target population, specifically chosen to be the overlap population.

MEOct 15, 2025
Estimand framework and intercurrent events handling for clinical trials with time-to-event outcomes

Yixin Fang, Man Jin

The ICH E9(R1) guideline presents a framework of estimand for clinical trials, proposes five strategies for handling intercurrent events (ICEs), and provides a comprehensive discussion and many real-life clinical examples for quantitative outcomes and categorical outcomes. However, in ICH E9(R1) the discussion is lacking for time-to-event (TTE) outcomes. In this paper, we discuss how to define estimands and how to handle ICEs for clinical trials with TTE outcomes. Specifically, we discuss six ICE handling strategies, including those five strategies proposed by ICH E9(R1) and a new strategy, the competing-risk strategy. Compared with ICH E9(R1), the novelty of this paper is three-fold: (1) the estimands are defined in terms of potential outcomes, (2) the methods can utilize time-dependent covariates straightforwardly, and (3) the efficient estimators are discussed accordingly.

CLNov 21, 2021
Isomer: Transfer enhanced Dual-Channel Heterogeneous Dependency Attention Network for Aspect-based Sentiment Classification

Yukun Cao, Yijia Tang, Ziyue Wei et al.

Aspect-based sentiment classification aims to predict the sentiment polarity of a specific aspect in a sentence. However, most existing methods attempt to construct dependency relations into a homogeneous dependency graph with the sparsity and ambiguity, which cannot cover the comprehensive contextualized features of short texts or consider any additional node types or semantic relation information. To solve those issues, we present a sentiment analysis model named Isomer, which performs a dual-channel attention on heterogeneous dependency graphs incorporating external knowledge, to effectively integrate other additional information. Specifically, a transfer-enhanced dual-channel heterogeneous dependency attention network is devised in Isomer to model short texts using heterogeneous dependency graphs. These heterogeneous dependency graphs not only consider different types of information but also incorporate external knowledge. Experiments studies show that our model outperforms recent models on benchmark datasets. Furthermore, the results suggest that our method captures the importance of various information features to focus on informative contextual words.

MLJul 1, 2017
On Scalable Inference with Stochastic Gradient Descent

Yixin Fang, Jinfeng Xu, Lei Yang

In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains much unexplored. The traditional resampling method such as the bootstrap is not computationally feasible since it requires to repeatedly draw independent samples from the entire dataset. The plug-in method is not applicable when there are no explicit formulas for the covariance matrix of the estimator. In this paper, we propose a scalable inferential procedure for stochastic gradient descent, which, upon the arrival of each observation, updates the SGD estimate as well as a large number of randomly perturbed SGD estimates. The proposed method is easy to implement in practice. We establish its theoretical properties for a general class of models that includes generalized linear models and quantile regression models as special cases. The finite-sample performance and numerical utility is evaluated by simulation studies and two real data applications.

MEJan 18, 2016
Sparse Convex Clustering

Binhuan Wang, Yilong Zhang, Will Wei Sun et al.

Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this paper, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. In order to optimally balance the tradeoff between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. In theory, we provide an unbiased estimator for the degrees of freedom of the proposed sparse convex clustering method. Finally, the effectiveness of the sparse convex clustering is examined through a variety of numerical experiments and a real data application.

MEJan 30, 2013
A note on selection stability: combining stability and prediction

Yixin Fang, Junhui Wang, Wei Sun

Recently, many regularized procedures have been proposed for variable selection in linear regression, but their performance depends on the tuning parameter selection. Here a criterion for the tuning parameter selection is proposed, which combines the strength of both stability selection and cross-validation and therefore is referred as the prediction and stability selection (PASS). The selection consistency is established assuming the data generating model is a subset of the full model, and the small sample performance is demonstrated through some simulation studies where the assumption is either held or violated.

MLAug 16, 2012
Consistent selection of tuning parameters via variable selection stability

Wei Sun, Junhui Wang, Yixin Fang

Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on a novel concept of variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both fixed and diverging dimensions. The effectiveness of the proposed criterion is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data.