LGSep 13, 2022
Rényi Divergence Deep Mutual LearningWeipeng Huang, Junjie Tao, Changbo Deng et al.
This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using Rényi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and Rényi divergence, leading to further improvement in model generalization.
LGFeb 20, 2025
Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space ShiftsWeipeng Huang, Qin Li, Yang Xiao et al.
Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this paper, rather than noisy label learning in multiclass classifications, we instead focus on the less explored area of noisy label learning for multilabel classifications. Specifically, we investigate the post-correction of predictions generated from classifiers learned with noisy labels. The reasons are two-fold. Firstly, this approach can directly work with the trained models to save computational resources. Secondly, it could be applied on top of other noisy label correction techniques to achieve further improvements. To handle this problem, we appeal to deep generative approaches that are possible for uncertainty estimation. Our model posits that label noise arises from a stochastic shift in the latent variable, providing a more robust and beneficial means for noisy learning. We develop both unsupervised and semi-supervised learning methods for our model. The extensive empirical study presents solid evidence to that our approach is able to consistently improve the independent models and performs better than a number of existing methods across various noisy label settings. Moreover, a comprehensive empirical analysis of the proposed method is carried out to validate its robustness, including sensitivity analysis and an ablation study, among other elements.
LGJul 6, 2025
Enhancing Text-Based Hierarchical Multilabel Classification for Mobile Applications via Contrastive LearningJiawei Guo, Yang Xiao, Weipeng Huang et al.
A hierarchical labeling system for mobile applications (apps) benefits a wide range of downstream businesses that integrate the labeling with their proprietary user data, to improve user modeling. Such a label hierarchy can define more granular labels that capture detailed app features beyond the limitations of traditional broad app categories. In this paper, we address the problem of hierarchical multilabel classification for apps by using their textual information such as names and descriptions. We present: 1) HMCN (Hierarchical Multilabel Classification Network) for handling the classification from two perspectives: the first focuses on a multilabel classification without hierarchical constraints, while the second predicts labels sequentially at each hierarchical level considering such constraints; 2) HMCL (Hierarchical Multilabel Contrastive Learning), a scheme that is capable of learning more distinguishable app representations to enhance the performance of HMCN. Empirical results on our Tencent App Store dataset and two public datasets demonstrate that our approach performs well compared with state-of-the-art methods. The approach has been deployed at Tencent and the multilabel classification outputs for apps have helped a downstream task--credit risk management of user--improve its performance by 10.70% with regard to the Kolmogorov-Smirnov metric, for over one year.
CLJan 25, 2024
Demystifying Chains, Trees, and Graphs of ThoughtsMaciej Besta, Florim Memedi, Zhenyu Zhang et al.
The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM's capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and other parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques.
MLAug 19, 2019
Partially Observable Markov Decision Process Modelling for Assessing HierarchiesWeipeng Huang, Guangyuan Piao, Raul Moreno et al.
Hierarchical clustering has been shown to be valuable in many scenarios. Despite its usefulness to many situations, there is no agreed methodology on how to properly evaluate the hierarchies produced from different techniques, particularly in the case where ground-truth labels are unavailable. This motivates us to propose a framework for assessing the quality of hierarchical clustering allocations which covers the case of no ground-truth information. This measurement is useful, e.g., to assess the hierarchical structures used by online retailer websites to display their product catalogues. Our framework is one of the few attempts for the hierarchy evaluation from a decision-theoretic perspective. We model the process as a bot searching stochastically for items in the hierarchy and establish a measure representing the degree to which the hierarchy supports this search. We employ Partially Observable Markov Decision Processes (POMDP) to model the uncertainty, the decision making, and the cognitive return for searchers in such a scenario.
MLMay 13, 2019
Inferring Hierarchical Mixture Structures: A Bayesian Nonparametric ApproachWeipeng Huang, Nishma Laitonjam, Guangyuan Piao et al.
This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet Process (HDP). Compared with other existing Bayesian approaches, our solution tackles data with complex latent mixture features which has not been previously explored in the literature. We discuss the details of the model and the inference procedure. Furthermore, experiments on three datasets show that our method achieves solid empirical results in comparison with existing algorithms.
IRDec 20, 2017
Inferring User Interests in Microblogging Social Networks: A SurveyGuangyuan Piao, John G. Breslin
With the growing popularity of microblogging services such as Twitter in recent years, an increasing number of users are using these services in their daily lives. The huge volume of information generated by users raises new opportunities in various applications and areas. Inferring user interests plays a significant role in providing personalized recommendations on microblogging services, and also on third-party applications providing social logins via these services, especially in cold-start situations. In this survey, we review user modeling strategies with respect to inferring user interests from previous studies. To this end, we focus on four dimensions of inferring user interest profiles: (1) data collection, (2) representation of user interest profiles, (3) construction and enhancement of user interest profiles, and (4) the evaluation of the constructed profiles. Through this survey, we aim to provide an overview of state-of-the-art user modeling strategies for inferring user interest profiles on microblogging social networks with respect to the four dimensions. For each dimension, we review and summarize previous studies based on specified criteria. Finally, we discuss some challenges and opportunities for future work in this research domain.
IRJul 18, 2017
Factorization Machines Leveraging Lightweight Linked Open Data-enabled Features for Top-N RecommendationsGuangyuan Piao, John G. Breslin
With the popularity of Linked Open Data (LOD) and the associated rise in freely accessible knowledge that can be accessed via LOD, exploiting LOD for recommender systems has been widely studied based on various approaches such as graph-based or using different machine learning models with LOD-enabled features. Many of the previous approaches require construction of an additional graph to run graph-based algorithms or to extract path-based features by combining user- item interactions (e.g., likes, dislikes) and background knowledge from LOD. In this paper, we investigate Factorization Machines (FMs) based on particularly lightweight LOD-enabled features which can be directly obtained via a public SPARQL Endpoint without any additional effort to construct a graph. Firstly, we aim to study whether using FM with these lightweight LOD-enabled features can provide competitive performance compared to a learning-to-rank approach leveraging LOD as well as other well-established approaches such as kNN-item and BPRMF. Secondly, we are interested in finding out to what extent each set of LOD-enabled features contributes to the recommendation performance. Experimental evaluation on a standard dataset shows that our proposed approach using FM with lightweight LOD-enabled features provides the best performance compared to other approaches in terms of five evaluation metrics. In addition, the study of the recommendation performance based on different sets of LOD-enabled features indicate that property-object lists and PageRank scores of items are useful for improving the performance, and can provide the best performance through using them together for FM. We observe that subject-property lists of items does not contribute to the recommendation performance but rather decreases the performance.