Liao Zhu

ST
9papers
97citations
Novelty39%
AI Score41

9 Papers

MAApr 9
ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents

Kenan Li, Qirui Jin, Liao Zhu et al.

Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, the individual contribution of each signal to overall success remains underexplored, particularly their ideal contribution when intermediate information is perfectly obtained. To address this gap, we introduce Oracle-SWE, a unified method to isolate and extract oracle information signals from SWE benchmarks and quantify the impact of each signal on agent performance. To further validate the pattern, we evaluate the performance gain of signals extracted by strong LMs when provided to a base agent, approximating real-world task-resolution settings. These evaluations aim to guide research prioritization for autonomous coding systems.

SEMar 5
RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform

Kenan Li, Rongzhi Li, Linghao Zhang et al.

Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating systems. To demonstrate its utility, we further propose a fully automated pipeline for SWE dataset creation, where task design is the only human intervention. RepoLaunch automates the remaining steps, enabling scalable benchmarking and training of coding agents and LLMs. Notably, several works on agentic benchmarking and training have recently adopted RepoLaunch for automated task generation.

STJul 30, 2021
The Adaptive Multi-Factor Model and the Financial Market

Liao Zhu

Modern evolvements of the technologies have been leading to a profound influence on the financial market. The introduction of constituents like Exchange-Traded Funds, and the wide-use of advanced technologies such as algorithmic trading, results in a boom of the data which provides more opportunities to reveal deeper insights. However, traditional statistical methods always suffer from the high-dimensional, high-correlation, and time-varying instinct of the financial data. In this dissertation, we focus on developing techniques to stress these difficulties. With the proposed methodologies, we can have more interpretable models, clearer explanations, and better predictions.

STJul 5, 2021
Clustering Structure of Microstructure Measures

Liao Zhu, Ningning Sun, Martin T. Wells

This paper builds the clustering model of measures of market microstructure features which are popular in predicting stock returns. In a 10-second time-frequency, we study the clustering structure of different measures to find out the best ones for predicting. In this way, we can predict more accurately with a limited number of predictors, which removes the noise and makes the model more interpretable.

STJun 13, 2021
A News-based Machine Learning Model for Adaptive Asset Pricing

Liao Zhu, Haoxuan Wu, Martin T. Wells

The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. After that for each stock, we select the basis assets to explain and predict the stock return with high-dimensional statistical methods. The new model is shown to have a significantly better fitting and prediction power than the Fama-French 5-factor model.

STNov 9, 2020
Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model

Liao Zhu, Robert A. Jarrow, Martin T. Wells

The purpose of this paper is to test the time-invariance of the beta coefficients estimated by the Adaptive Multi-Factor (AMF) model. The AMF model is implied by the generalized arbitrage pricing theory (GAPT), which implies constant beta coefficients. The AMF model utilizes a Groupwise Interpretable Basis Selection (GIBS) algorithm to identify the relevant factors from among all traded ETFs. We compare the AMF model with the Fama-French 5-factor (FF5) model. We show that for nearly all time periods with length less than 6 years, the beta coefficients are time-invariant for the AMF model, but not for the FF5 model. This implies that the AMF model with a rolling window (such as 5 years) is more consistent with realized asset returns than is the FF5 model.

STMar 16, 2020
The Low-volatility Anomaly and the Adaptive Multi-Factor Model

Robert A. Jarrow, Rinald Murataj, Martin T. Wells et al.

The paper provides a new explanation of the low-volatility anomaly. We use the Adaptive Multi-Factor (AMF) model estimated by the Groupwise Interpretable Basis Selection (GIBS) algorithm to find those basis assets significantly related to low and high volatility portfolios. These two portfolios load on very different factors, indicating that volatility is not an independent risk, but that it's related to existing risk factors. The out-performance of the low-volatility portfolio is due to the (equilibrium) performance of these loaded risk factors. The AMF model outperforms the Fama-French 5-factor model both in-sample and out-of-sample.

CVJan 4, 2020
FrequentNet: A Novel Interpretable Deep Learning Model for Image Classification

Yifei Li, Kuangyan Song, Yiming Sun et al.

This paper has proposed a new baseline deep learning model of more benefits for image classification. Different from the convolutional neural network(CNN) practice where filters are trained by back propagation to represent different patterns of an image, we are inspired by a method called "PCANet" in "PCANet: A Simple Deep Learning Baseline for Image Classification?" to choose filter vectors from basis vectors in frequency domain like Fourier coefficients or wavelets without back propagation. Researchers have demonstrated that those basis in frequency domain can usually provide physical insights, which adds to the interpretability of the model by analyzing the frequencies selected. Besides, the training process will also be more time efficient, mathematically clear and interpretable compared with the "black-box" training process of CNN.

STApr 23, 2018
High-Dimensional Estimation, Basis Assets, and the Adaptive Multi-Factor Model

Liao Zhu, Sumanta Basu, Robert A. Jarrow et al.

The paper proposes a new algorithm for the high-dimensional financial data -- the Groupwise Interpretable Basis Selection (GIBS) algorithm, to estimate a new Adaptive Multi-Factor (AMF) asset pricing model, implied by the recently developed Generalized Arbitrage Pricing Theory, which relaxes the convention that the number of risk-factors is small. We first obtain an adaptive collection of basis assets and then simultaneously test which basis assets correspond to which securities, using high-dimensional methods. The AMF model, along with the GIBS algorithm, is shown to have a significantly better fitting and prediction power than the Fama-French 5-factor model.