Zan-Kai Chong

AI
h-index25
7papers
18citations
Novelty50%
AI Score36

7 Papers

AIDec 24, 2025
The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents

Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

Autonomous agents powered by LLMs and Retrieval-Augmented Generation (RAG) are proficient consumers of digital content but remain unidirectional, a limitation we term epistemic asymmetry. This isolation leads to redundant reasoning and stagnates collective intelligence. Current self-reflection frameworks remain largely heuristic and private, lacking a probabilistic foundation to quantify certainty or justify external interaction.To bridge this gap, we propose a formal probabilistic framework that provides agents with a non-altruistic motive for bidirectional knowledge exchange. We model an agent's belief in a proposition using a Beta-Bernoulli distribution with a forgetting factor ($γ$). This allows us to isolate epistemic uncertainty as the variance of belief, establishing a dual drive for interaction: A homeostatic motive: The need to maintain certainty against the temporal decay introduced by $γ$. An optimal learning strategy: Targeting points of maximum ambiguity ($\mathbb{E}[θ]=0.5$) to maximize information gain. Under this framework, public contribution is reframed as optimal active learning: sharing solutions to elicit feedback is the most efficient method for an agent to reduce its own uncertainty. To ensure scalability, we introduce epistemic caching, which leverages the forgetting factor to dynamically prioritize resources for the active head of non-stationary knowledge distributions. Finally, we demonstrate how these accumulated belief states serve as verifiable reward signals for Reinforcement Learning from Human Feedback (RLHF) and high-quality data filters for Supervised Fine-Tuning (SFT). Simulation results validate that this uncertainty-driven strategy significantly outperforms random baselines in heterogeneous (Zipfian) environments, maintaining high adaptability to concept drift.

AIJan 13, 2025
LLM-Net: Democratizing LLMs-as-a-Service through Blockchain-based Expert Networks

Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

The centralization of Large Language Models (LLMs) development has created significant barriers to AI advancement, limiting the democratization of these powerful technologies. This centralization, coupled with the scarcity of high-quality training data and mounting complexity of maintaining comprehensive expertise across rapidly expanding knowledge domains, poses critical challenges to the continued growth of LLMs. While solutions like Retrieval-Augmented Generation (RAG) offer potential remedies, maintaining up-to-date expert knowledge across diverse domains remains a significant challenge, particularly given the exponential growth of specialized information. This paper introduces LLMs Networks (LLM-Net), a blockchain-based framework that democratizes LLMs-as-a-Service through a decentralized network of specialized LLM providers. By leveraging collective computational resources and distributed domain expertise, LLM-Net incorporates fine-tuned expert models for various specific domains, ensuring sustained knowledge growth while maintaining service quality through collaborative prompting mechanisms. The framework's robust design includes blockchain technology for transparent transaction and performance validation, establishing an immutable record of service delivery. Our simulation, built on top of state-of-the-art LLMs such as Claude 3.5 Sonnet, Llama 3.1, Grok-2, and GPT-4o, validates the effectiveness of the reputation-based mechanism in maintaining service quality by selecting high-performing respondents (LLM providers). Thereby it demonstrates the potential of LLM-Net to sustain AI advancement through the integration of decentralized expertise and blockchain-based accountability.

CRApr 24, 2025
Proof of Useful Intelligence (PoUI): Blockchain Consensus Beyond Energy Waste

Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

Blockchain technology enables secure, transparent data management in decentralized systems, supporting applications from cryptocurrencies like Bitcoin to tokenizing real-world assets like property. Its scalability and sustainability hinge on consensus mechanisms balancing security and efficiency. Proof of Work (PoW), used by Bitcoin, ensures security through energy-intensive computations but demands significant resources. Proof of Stake (PoS), as in Ethereum post-Merge, selects validators based on staked cryptocurrency, offering energy efficiency but risking centralization from wealth concentration. With AI models straining computational resources, we propose Proof of Useful Intelligence (PoUI), a hybrid consensus mechanism. In PoUI, workers perform AI tasks like language processing or image analysis to earn coins, which are staked to secure the network, blending security with practical utility. Decentralized nodes--job posters, market coordinators, workers, and validators --collaborate via smart contracts to manage tasks and rewards.

LGMar 3, 2024
Improving Uncertainty Sampling with Bell Curve Weight Function

Zan-Kai Chong, Hiroyuki Ohsaki, Bok-Min Goi

Typically, a supervised learning model is trained using passive learning by randomly selecting unlabelled instances to annotate. This approach is effective for learning a model, but can be costly in cases where acquiring labelled instances is expensive. For example, it can be time-consuming to manually identify spam mails (labelled instances) from thousands of emails (unlabelled instances) flooding an inbox during initial data collection. Generally, we answer the above scenario with uncertainty sampling, an active learning method that improves the efficiency of supervised learning by using fewer labelled instances than passive learning. Given an unlabelled data pool, uncertainty sampling queries the labels of instances where the predicted probabilities, p, fall into the uncertainty region, i.e., $p \approx 0.5$. The newly acquired labels are then added to the existing labelled data pool to learn a new model. Nonetheless, the performance of uncertainty sampling is susceptible to the area of unpredictable responses (AUR) and the nature of the dataset. It is difficult to determine whether to use passive learning or uncertainty sampling without prior knowledge of a new dataset. To address this issue, we propose bell curve sampling, which employs a bell curve weight function to acquire new labels. With the bell curve centred at p=0.5, bell curve sampling selects instances whose predicted values are in the uncertainty area most of the time without neglecting the rest. Simulation results show that, most of the time bell curve sampling outperforms uncertainty sampling and passive learning in datasets of different natures and with AUR.

AISep 14, 2025
Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability

Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

The landscape of Large Language Models (LLMs) shifts rapidly towards dynamic, multi-agent systems. This introduces a fundamental challenge in establishing computational trust, specifically how one agent can verify that another's output was genuinely produced by a claimed LLM, and not falsified or generated by a cheaper or inferior model. To address this challenge, this paper proposes a verification framework that achieves tractable asymmetric effort, where the cost to verify a computation is substantially lower than the cost to perform it. Our approach is built upon the principle of deterministic replicability, a property inherent to autoregressive models that strictly necessitates a computationally homogeneous environment where all agents operate on identical hardware and software stacks. Within this defined context, our framework enables multiple validators to probabilistically audit small, random segments of an LLM's output and it distributes the verification workload effectively. The simulations demonstrated that targeted verification can be over 12 times faster than full regeneration, with tunable parameters to adjust the detection probability. By establishing a tractable mechanism for auditable LLM systems, our work offers a foundational layer for responsible AI and serves as a cornerstone for future research into the more complex, heterogeneous multi-agent systems.

LGMar 27, 2024
Collaborative Active Learning in Conditional Trust Environment

Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses privacy and security concerns by eliminating the need for direct model and data disclosure; (b) it enables the use of different data sources and insights without direct data exchange; and (c) it promotes cost-effectiveness and resource efficiency through shared labeling costs. To realize these benefits, we introduce a collaborative active learning framework designed to fulfill the aforementioned objectives. We validate the effectiveness of the proposed framework through simulations. The results demonstrate that collaboration leads to higher AUC scores compared to independent efforts, highlighting the framework's ability to overcome the limitations of individual models. These findings support the use of collaborative approaches in active learning, emphasizing their potential to enhance outcomes through collective expertise and shared resources. Our work provides a foundation for further research on collaborative active learning and its practical applications in various domains where data privacy, cost efficiency, and model performance are critical considerations.

LGMar 2, 2024
Improve Cost Efficiency of Active Learning over Noisy Dataset

Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

Active learning is a learning strategy whereby the machine learning algorithm actively identifies and labels data points to optimize its learning. This strategy is particularly effective in domains where an abundance of unlabeled data exists, but the cost of labeling these data points is prohibitively expensive. In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances. For example, in the financial industry, such as in money-lending businesses, a defaulted loan constitutes a positive event leading to substantial financial loss. To address this issue, we propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling. Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.