LGMar 1, 2025Code
Hidden Convexity of Fair PCA and Fast Solver via Eigenvalue OptimizationJunhui Shen, Aaron J. Davis, Ding Lu et al.
Principal Component Analysis (PCA) is a foundational technique in machine learning for dimensionality reduction of high-dimensional datasets. However, PCA could lead to biased outcomes that disadvantage certain subgroups of the underlying datasets. To address the bias issue, a Fair PCA (FPCA) model was introduced by Samadi et al. (2018) for equalizing the reconstruction loss between subgroups. The semidefinite relaxation (SDR) based approach proposed by Samadi et al. (2018) is computationally expensive even for suboptimal solutions. To improve efficiency, several alternative variants of the FPCA model have been developed. These variants often shift the focus away from equalizing the reconstruction loss. In this paper, we identify a hidden convexity in the FPCA model and introduce an algorithm for convex optimization via eigenvalue optimization. Our approach achieves the desired fairness in reconstruction loss without sacrificing performance. As demonstrated in real-world datasets, the proposed FPCA algorithm runs $8\times$ faster than the SDR-based algorithm, and only at most 85% slower than the standard PCA.
CLOct 13, 2014
Sentiment Analysis based on User Tag for Traditional Chinese Medicine in WeiboJunhui Shen, Peiyan Zhu, Rui Fan et al.
With the acceptance of Western culture and science, Traditional Chinese Medicine (TCM) has become a controversial issue in China. So, it's important to study the public's sentiment and opinion on TCM. The rapid development of online social network, such as twitter, make it convenient and efficient to sample hundreds of millions of people for the aforementioned sentiment study. To the best of our knowledge, the present work is the first attempt that applies sentiment analysis to the domain of TCM on Sina Weibo (a twitter-like microblogging service in China). In our work, firstly we collect tweets topic about TCM from Sina Weibo, and label the tweets as supporting TCM and opposing TCM automatically based on user tag. Then, a support vector machine classifier has been built to predict the sentiment of TCM tweets without labels. Finally, we present a method to adjust the classifier result. The performance of F-measure attained with our method is 97%.