Zi-Yu Khoo

LG
h-index39
10papers
18citations
Novelty31%
AI Score36

10 Papers

LGNov 13, 2025
Rediscovering the Lunar Equation of the Centre with AI Feynman via Embedded Physical Biases

Saumya Shah, Zi-Yu Khoo, Abel Yang et al.

This work explores using the physics-inspired AI Feynman symbolic regression algorithm to automatically rediscover a fundamental equation in astronomy -- the Equation of the Centre. Through the introduction of observational and inductive biases corresponding to the physical nature of the system through data preprocessing and search space restriction, AI Feynman was successful in recovering the first-order analytical form of this equation from lunar ephemerides data. However, this manual approach highlights a key limitation in its reliance on expert-driven coordinate system selection. We therefore propose an automated preprocessing extension to find the canonical coordinate system. Results demonstrate that targeted domain knowledge embedding enables symbolic regression to rediscover physical laws, but also highlight further challenges in constraining symbolic regression to derive physics equations when leveraging domain knowledge through tailored biases.

AIAug 21, 2024
Physics-informed Discovery of State Variables in Second-Order and Hamiltonian Systems

Félix Chavelli, Zi-Yu Khoo, Dawen Wu et al.

The modeling of dynamical systems is a pervasive concern for not only describing but also predicting and controlling natural phenomena and engineered systems. Current data-driven approaches often assume prior knowledge of the relevant state variables or result in overparameterized state spaces. Boyuan Chen and his co-authors proposed a neural network model that estimates the degrees of freedom and attempts to discover the state variables of a dynamical system. Despite its innovative approach, this baseline model lacks a connection to the physical principles governing the systems it analyzes, leading to unreliable state variables. This research proposes a method that leverages the physical characteristics of second-order Hamiltonian systems to constrain the baseline model. The proposed model outperforms the baseline model in identifying a minimal set of non-redundant and interpretable state variables.

LGDec 14, 2023
What's Next? Predicting Hamiltonian Dynamics from Discrete Observations of a Vector Field

Zi-Yu Khoo, Delong Zhang, Stéphane Bressan

We present several methods for predicting the dynamics of Hamiltonian systems from discrete observations of their vector field. Each method is either informed or uninformed of the Hamiltonian property. We empirically and comparatively evaluate the methods and observe that information that the system is Hamiltonian can be effectively informed, and that different methods strike different trade-offs between efficiency and effectiveness for different dynamical systems.

LGDec 15, 2023
Celestial Machine Learning: From Data to Mars and Beyond with AI Feynman

Zi-Yu Khoo, Abel Yang, Jonathan Sze Choong Low et al.

Can a machine or algorithm discover or learn Kepler's first law from astronomical sightings alone? We emulate Johannes Kepler's discovery of the equation of the orbit of Mars with the Rudolphine tables using AI Feynman, a physics-inspired tool for symbolic regression.

AISep 14, 2025
Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble

Bingchen Wang, Zi-Yu Khoo, Bryan Kian Hsiang Low

Large language models (LLMs) have demonstrated promise in emulating human-like responses across a wide range of tasks. In this paper, we propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents, affording a cost-effective and steerable solution to two pressing challenges in the social sciences: the rising cost of survey deployment and the growing demographic imbalance in survey response data. Drawing inspiration from the theory of revealed preference, we formulate alignment as a two-stage problem: constructing diverse agent personas called endowments that simulate plausible respondent profiles, and selecting a representative subset to approximate a ground-truth population based on observed data. To implement the paradigm, we introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection. Unlike personalization-heavy approaches, our alignment approach is demographic-agnostic and relies only on aggregate survey results, offering better generalizability and parsimony. Beyond improving data efficiency in social science research, our framework offers a testbed for studying the operationalization of pluralistic alignment. We demonstrate the efficacy of our approach on real-world opinion survey datasets, showing that our aligned agent populations can reproduce aggregate response patterns with high fidelity and exhibit substantial response diversity, even without demographic conditioning.

LGSep 9, 2025
Uncovering Scaling Laws for Large Language Models via Inverse Problems

Arun Verma, Zhaoxuan Wu, Zijian Zhou et al.

Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems in uncovering fundamental scientific laws, this position paper advocates that inverse problems can also efficiently uncover scaling laws that guide the building of LLMs to achieve the desirable performance with significantly better cost-effectiveness.

ED-PHJun 17, 2024
A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression

Yufan Zhu, Zi-Yu Khoo, Jonathan Sze Choong Low et al.

Interleaved practice enhances the memory and problem-solving ability of students in undergraduate courses. We introduce a personalized learning tool built on a Large Language Model (LLM) that can provide immediate and personalized attention to students as they complete homework containing problems interleaved from undergraduate physics courses. Our tool leverages the dimensional analysis method, enhancing students' qualitative thinking and problem-solving skills for complex phenomena. Our approach combines LLMs for symbolic regression with dimensional analysis via prompt engineering and offers students a unique perspective to comprehend relationships between physics variables. This fosters a broader and more versatile understanding of physics and mathematical principles and complements a conventional undergraduate physics education that relies on interpreting and applying established equations within specific contexts. We test our personalized learning tool on the equations from Feynman's lectures on physics. Our tool can correctly identify relationships between physics variables for most equations, underscoring its value as a complementary personalized learning tool for undergraduate physics students.

EPDec 19, 2023
Celestial Machine Learning: Discovering the Planarity, Heliocentricity, and Orbital Equation of Mars with AI Feynman

Zi-Yu Khoo, Gokul Rajiv, Abel Yang et al.

Can a machine or algorithm discover or learn the elliptical orbit of Mars from astronomical sightings alone? Johannes Kepler required two paradigm shifts to discover his First Law regarding the elliptical orbit of Mars. Firstly, a shift from the geocentric to the heliocentric frame of reference. Secondly, the reduction of the orbit of Mars from a three- to a two-dimensional space. We extend AI Feynman, a physics-inspired tool for symbolic regression, to discover the heliocentricity and planarity of Mars' orbit and emulate his discovery of Kepler's first law.

LGDec 15, 2023
A Comparative Evaluation of Additive Separability Tests for Physics-Informed Machine Learning

Zi-Yu Khoo, Jonathan Sze Choong Low, Stéphane Bressan

Many functions characterising physical systems are additively separable. This is the case, for instance, of mechanical Hamiltonian functions in physics, population growth equations in biology, and consumer preference and utility functions in economics. We consider the scenario in which a surrogate of a function is to be tested for additive separability. The detection that the surrogate is additively separable can be leveraged to improve further learning. Hence, it is beneficial to have the ability to test for such separability in surrogates. The mathematical approach is to test if the mixed partial derivative of the surrogate is zero; or empirically, lower than a threshold. We present and comparatively and empirically evaluate the eight methods to compute the mixed partial derivative of a surrogate function.

LGSep 3, 2023
Separable Hamiltonian Neural Networks

Zi-Yu Khoo, Dawen Wu, Jonathan Sze Choong Low et al.

Hamiltonian neural networks (HNNs) are state-of-the-art models that regress the vector field of a dynamical system under the learning bias of Hamilton's equations. A recent observation is that embedding a bias regarding the additive separability of the Hamiltonian reduces the regression complexity and improves regression performance. We propose separable HNNs that embed additive separability within HNNs using observational, learning, and inductive biases. We show that the proposed models are more effective than the HNN at regressing the Hamiltonian and the vector field. Consequently, the proposed models predict the dynamics and conserve the total energy of the Hamiltonian system more accurately.