Haocheng Ju

IR
h-index4
8papers
72citations
Novelty36%
AI Score46

8 Papers

SPMar 7, 2023
A Comparative Study of Deep Learning and Iterative Algorithms for Joint Channel Estimation and Signal Detection in OFDM Systems

Haocheng Ju, Haimiao Zhang, Lin Li et al.

Joint channel estimation and signal detection (JCESD) is crucial in orthogonal frequency division multiplexing (OFDM) systems, but traditional algorithms perform poorly in low signal-to-noise ratio (SNR) scenarios. Deep learning (DL) methods have been investigated, but concerns regarding computational expense and lack of validation in low-SNR settings remain. Hence, the development of a robust and low-complexity model that can deliver excellent performance across a wide range of SNRs is highly desirable. In this paper, we aim to establish a benchmark where traditional algorithms and DL methods are validated on different channel models, Doppler, and SNR settings, particularly focusing on the semi-blind setting. In particular, we propose a new DL model where the backbone network is formed by unrolling the iterative algorithm, and the hyperparameters are estimated by hypernetworks. Additionally, we adapt a lightweight DenseNet to the task of JCESD for comparison. We evaluate different methods in three aspects: generalization in terms of bit error rate (BER), robustness, and complexity. Our results indicate that DL approaches outperform traditional algorithms in the challenging low-SNR setting, while the iterative algorithm performs better in high-SNR settings. Furthermore, the iterative algorithm is more robust in the presence of carrier frequency offset, whereas DL methods excel when signals are corrupted by asymmetric Gaussian noise.

97.9LGApr 4
Automated Conjecture Resolution with Formal Verification

Haocheng Ju, Guoxiong Gao, Jiedong Jiang et al.

Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains challenging due to the inherent ambiguity of natural language reasoning. In this paper, we propose an automated framework for tackling research-level mathematical problems that integrates natural language reasoning with formal verification, enabling end-to-end problem solving with minimal human intervention. Our framework consists of two components: an informal reasoning agent, Rethlas, and a formal verification agent, Archon. Rethlas mimics the workflow of human mathematicians by combining reasoning primitives with our theorem search engine, Matlas, to explore solution strategies and construct candidate proofs. Archon, equipped with our formal theorem search engine LeanSearch, translates informal arguments into formalized Lean 4 projects through structured task decomposition, iterative refinement, and automated proof synthesis, ensuring machine-checkable correctness. Using this framework, we automatically resolve an open problem in commutative algebra and formally verify the resulting proof in Lean 4 with essentially no human involvement. Our experiments demonstrate that strong theorem retrieval tools enable the discovery and application of cross-domain mathematical techniques, while the formal agent is capable of autonomously filling nontrivial gaps in informal arguments. More broadly, our work illustrates a promising paradigm for mathematical research in which informal and formal reasoning systems, equipped with theorem retrieval tools, operate in tandem to produce verifiable results, substantially reduce human effort, and offer a concrete instantiation of human-AI collaborative mathematical research.

90.6IRApr 19
Matlas: A Semantic Search Engine for Mathematics

Haocheng Ju, Leheng Chen, Peihao Wu et al.

Retrieving mathematical knowledge is a central task in both human-driven research, such as determining whether a result already exists, finding related results, and identifying historical origins, and in emerging AI systems for mathematics, where reliable grounding is essential. However, the scale and structure of the mathematical literature pose significant challenges: results are distributed across millions of documents, and individual statements are often difficult to interpret in isolation due to their dependence on prior definitions and theorems. In this paper, we introduce Matlas, a semantic search engine for mathematical statements. Matlas is built on a large-scale corpus of 8.07 million statements extracted from 435K peer-reviewed papers spanning 1826 to 2025, drawn from a curated set of 180 journals selected using an ICM citation-based criterion, together with 1.9K textbooks. From these sources, we extract mathematical statements together with their dependencies, construct document-level dependency graphs, and recursively unfold statements in topological order to produce more self-contained representations. On top of this corpus, we develop a semantic retrieval system that enables efficient search for mathematical results using natural language queries. We hope that Matlas can improve the efficiency of theorem retrieval for mathematicians and provide a structured source of grounding for AI systems tackling research-level mathematical problems, and serve as part of the infrastructure for mathematical knowledge retrieval.

CLMay 27, 2025Code
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

Ziju Shen, Naohao Huang, Fanyi Yang et al.

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).

74.1SIMar 28
Connected Theorems: A Graph-Based Approach to Evaluating Mathematical Results

Gergely Bérczi, Bin Dong, Haocheng Ju et al.

The evaluation of mathematical results plays a central role in assessing researchers' contributions and shaping the direction of the field. Currently, such evaluations rely primarily on human judgment, whether through journal peer review or committees at research institutions. To complement these traditional processes, we propose a data-driven approach. We construct a hierarchical graph linking conjectures, theorems, papers, authors and fields to capture their citation relationships. We then introduce a PageRank-style algorithm to compute influence scores for these entities. Using these scores, we analyze the evolution of field rankings over time and quantify the impact between fields. We hope this framework can contribute to the development of more advanced, quantitative methods for evaluating mathematical research and serve as a complement to expert assessment.

IRMar 20, 2024
A Semantic Search Engine for Mathlib4

Guoxiong Gao, Haocheng Ju, Jiedong Jiang et al.

The interactive theorem prover Lean enables the verification of formal mathematical proofs and is backed by an expanding community. Central to this ecosystem is its mathematical library, mathlib4, which lays the groundwork for the formalization of an expanding range of mathematical theories. However, searching for theorems in mathlib4 can be challenging. To successfully search in mathlib4, users often need to be familiar with its naming conventions or documentation strings. Therefore, creating a semantic search engine that can be used easily by individuals with varying familiarity with mathlib4 is very important. In this paper, we present a semantic search engine (https://leansearch.net/) for mathlib4 that accepts informal queries and finds the relevant theorems. We also establish a benchmark for assessing the performance of various search engines for mathlib4.

IRMay 21, 2025
MIRB: Mathematical Information Retrieval Benchmark

Haocheng Ju, Bin Dong

Mathematical Information Retrieval (MIR) is the task of retrieving information from mathematical documents and plays a key role in various applications, including theorem search in mathematical libraries, answer retrieval on math forums, and premise selection in automated theorem proving. However, a unified benchmark for evaluating these diverse retrieval tasks has been lacking. In this paper, we introduce MIRB (Mathematical Information Retrieval Benchmark) to assess the MIR capabilities of retrieval models. MIRB includes four tasks: semantic statement retrieval, question-answer retrieval, premise retrieval, and formula retrieval, spanning a total of 12 datasets. We evaluate 13 retrieval models on this benchmark and analyze the challenges inherent to MIR. We hope that MIRB provides a comprehensive framework for evaluating MIR systems and helps advance the development of more effective retrieval models tailored to the mathematical domain.

CVJan 28, 2019
CURE: Curvature Regularization For Missing Data Recovery

Bin Dong, Haocheng Ju, Yiping Lu et al.

Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimension manifold model (LDMM) was introduced by S.Osher et al. and shown effective in image inpainting. They observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimension manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimension manifold regularization with a higher order Curvature Regularization, and we call this new regularization CURE for short. The key step of solving CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner as the weighted nonlocal Laplacian (WNLL) method. Numerical experiments for image inpainting and semi-supervised learning show that the proposed CURE and WeCURE significantly outperform LDMM and WNLL respectively.