Xingwei Hu

ML
5papers
12citations
Novelty41%
AI Score20

5 Papers

MLNov 14, 2021
Decoding Causality by Fictitious VAR Modeling

Xingwei Hu

In modeling multivariate time series for either forecast or policy analysis, it would be beneficial to have figured out the cause-effect relations within the data. Regression analysis, however, is generally for correlation relation, and very few researches have focused on variance analysis for causality discovery. We first set up an equilibrium for the cause-effect relations using a fictitious vector autoregressive model. In the equilibrium, long-run relations are identified from noise, and spurious ones are negligibly close to zero. The solution, called causality distribution, measures the relative strength causing the movement of all series or specific affected ones. If a group of exogenous data affects the others but not vice versa, then, in theory, the causality distribution for other variables is necessarily zero. The hypothesis test of zero causality is the rule to decide a variable is endogenous or not. Our new approach has high accuracy in identifying the true cause-effect relations among the data in the simulation studies. We also apply the approach to estimating the causal factors' contribution to climate change.

MLOct 5, 2021
Feature Selection by a Mechanism Design

Xingwei Hu

In constructing an econometric or statistical model, we pick relevant features or variables from many candidates. A coalitional game is set up to study the selection problem where the players are the candidates and the payoff function is a performance measurement in all possible modeling scenarios. Thus, in theory, an irrelevant feature is equivalent to a dummy player in the game, which contributes nothing to all modeling situations. The hypothesis test of zero mean contribution is the rule to decide a feature is irrelevant or not. In our mechanism design, the end goal perfectly matches the expected model performance with the expected sum of individual marginal effects. Within a class of noninformative likelihood among all modeling opportunities, the matching equation results in a specific valuation for each feature. After estimating the valuation and its standard deviation, we drop any candidate feature if its valuation is not significantly different from zero. In the simulation studies, our new approach significantly outperforms several popular methods used in practice, and its accuracy is robust to the choice of the payoff function.

MLMar 27, 2020
Sorting Big Data by Revealed Preference with Application to College Ranking

Xingwei Hu

When ranking big data observations such as colleges in the United States, diverse consumers reveal heterogeneous preferences. The objective of this paper is to sort out a linear ordering for these observations and to recommend strategies to improve their relative positions in the ranking. A properly sorted solution could help consumers make the right choices, and governments make wise policy decisions. Previous researchers have applied exogenous weighting or multivariate regression approaches to sort big data objects, ignoring their variety and variability. By recognizing the diversity and heterogeneity among both the observations and the consumers, we instead apply endogenous weighting to these contradictory revealed preferences. The outcome is a consistent steady-state solution to the counterbalance equilibrium within these contradictions. The solution takes into consideration the spillover effects of multiple-step interactions among the observations. When information from data is efficiently revealed in preferences, the revealed preferences greatly reduce the volume of the required data in the sorting process. The employed approach can be applied in many other areas, such as sports team ranking, academic journal ranking, voting, and real effective exchange rates.

STNov 12, 2018
On Asymptotic Covariances of A Few Unrotated Factor Solutions

Xingwei Hu

In this paper, we provide explicit formulas, in terms of the covariances of sample covariances or sample correlations, for the asymptotic covariances of unrotated factor loading estimates and unique variance estimates. These estimates are extracted from least square, principal, iterative principal component, alpha or image factor analysis. If the sample is taken from a multivariate normal population, these formulas, together with the delta methods, will produce the standard errors for the rotated loading estimates. A simulation study shows that the formulas provide reasonable results.

MLAug 1, 2018
A Theory of Dichotomous Valuation with Applications to Variable Selection

Xingwei Hu

An econometric or statistical model may undergo a marginal gain if we admit a new variable to the model, and a marginal loss if we remove an existing variable from the model. Assuming equality of opportunity among all candidate variables, we derive a valuation framework by the expected marginal gain and marginal loss in all potential modeling scenarios. However, marginal gain and loss are not symmetric; thus, we introduce three unbiased solutions. When used in variable selection, our new approaches significantly outperform several popular methods used in practice. The results also explore some novel traits of the Shapley value.