Ming Cheung

h-index11

4papers

13citations

Novelty49%

AI Score31

Ranked #132,233 of 194,257 authors (top 68%)#23,909 in CL (top 78%)

4 Papers

2.7CLJun 24, 2025

Hallucination Detection with Small Language Models

Ming Cheung

Since the introduction of ChatGPT, large language models (LLMs) have demonstrated significant utility in various tasks, such as answering questions through retrieval-augmented generation. Context can be retrieved using a vectorized database, serving as a foundation for LLMs to generate responses. However, hallucinations in responses can undermine the reliability of LLMs in practical applications, and they are not easily detectable in the absence of ground truth, particularly in question-and-answer scenarios. This paper proposes a framework that integrates multiple small language models to verify responses generated by LLMs using the retrieved context from a vectorized database. By breaking down the responses into individual sentences and utilizing the probability of generating "Yes" tokens from the outputs of multiple models for a given set of questions, responses, and relevant context, hallucinations can be detected. The proposed framework is validated through experiments with real datasets comprising over 100 sets of questions, answers, and contexts, including responses with fully and partially correct sentences. The results demonstrate a 10\% improvement in F1 scores for detecting correct responses compared to hallucinations, indicating that multiple small language models can be effectively employed for answer verification, providing a scalable and efficient solution for both academic and practical applications.

2.0LGDec 24, 2023

FedDMF: Privacy-Preserving User Attribute Prediction using Deep Matrix Factorization

Ming Cheung

User attribute prediction is a crucial task in various industries. However, sharing user data across different organizations faces challenges due to privacy concerns and legal requirements regarding personally identifiable information. Regulations such as the General Data Protection Regulation (GDPR) in the European Union and the Personal Information Protection Law of the People's Republic of China impose restrictions on data sharing. To address the need for utilizing features from multiple clients while adhering to legal requirements, federated learning algorithms have been proposed. These algorithms aim to predict user attributes without directly sharing the data. However, existing approaches typically rely on matching users across companies, which can result in dishonest partners discovering user lists or the inability to utilize all available features. In this paper, we propose a novel algorithm for predicting user attributes without requiring user matching. Our approach involves training deep matrix factorization models on different clients and sharing only the item vectors. This allows us to predict user attributes without sharing the user vectors themselves. The algorithm is evaluated using the publicly available MovieLens dataset and demonstrate that it achieves similar performance to the FedAvg algorithm, reaching 96% of a single model's accuracy. The proposed algorithm is particularly well-suited for improving customer targeting and enhancing the overall customer experience. This paper presents a valuable contribution to the field of user attribute prediction by offering a novel algorithm that addresses some of the most pressing privacy concerns in this area.

1.5CVSep 3, 2023

Face Clustering for Connection Discovery from Event Images

Ming Cheung

Social graphs are very useful for many applications, such as recommendations and community detections. However, they are only accessible to big social network operators due to both data availability and privacy concerns. Event images also capture the interactions among the participants, from which social connections can be discovered to form a social graph. Unlike online social graphs, social connections carried by event images can be extracted without user inputs, and hence many social graph-based applications become possible, even without access to online social graphs. This paper proposes a system to discover social connections from event images. By utilizing the social information from even images, such as co-occurrence, a face clustering method is proposed and implemented, and connections can be discovered without the identity of the event participants. By collecting over 40000 faces from over 3000 participants, it is shown that the faces can be well clustered with 80% in F1 score, and social graphs can be constructed. Utilizing offline event images may create a long-term impact on social network analytics.

1.2SIDec 12, 2016

Connection Discovery using Shared Images by Gaussian Relational Topic Model

Xiaopeng Li, Ming Cheung, James She

Social graphs, representing online friendships among users, are one of the fundamental types of data for many applications, such as recommendation, virality prediction and marketing in social media. However, this data may be unavailable due to the privacy concerns of users, or kept private by social network operators, which makes such applications difficult. Inferring user interests and discovering user connections through their shared multimedia content has attracted more and more attention in recent years. This paper proposes a Gaussian relational topic model for connection discovery using user shared images in social media. The proposed model not only models user interests as latent variables through their shared images, but also considers the connections between users as a result of their shared images. It explicitly relates user shared images to user connections in a hierarchical, systematic and supervisory way and provides an end-to-end solution for the problem. This paper also derives efficient variational inference and learning algorithms for the posterior of the latent variables and model parameters. It is demonstrated through experiments with over 200k images from Flickr that the proposed method significantly outperforms the methods in previous works.