Jing Leng

CL
h-index1
3papers
1citation
Novelty60%
AI Score44

3 Papers

CLMar 5Code
HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang, Fei Tan, Xuanyu Yin et al.

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI

CVNov 5, 2024Code
Precise Drive with VLM: First Prize Solution for PRCV 2024 Drive LM challenge

Bin Huang, Siyu Wang, Yuanpeng Chen et al.

This technical report outlines the methodologies we applied for the PRCV Challenge, focusing on cognition and decision-making in driving scenarios. We employed InternVL-2.0, a pioneering open-source multi-modal model, and enhanced it by refining both the model input and training methodologies. For the input data, we strategically concatenated and formatted the multi-view images. It is worth mentioning that we utilized the coordinates of the original images without transformation. In terms of model training, we initially pre-trained the model on publicly available autonomous driving scenario datasets to bolster its alignment capabilities of the challenge tasks, followed by fine-tuning on the DriveLM-nuscenes Dataset. During the fine-tuning phase, we innovatively modified the loss function to enhance the model's precision in predicting coordinate values. These approaches ensure that our model possesses advanced cognitive and decision-making capabilities in driving scenarios. Consequently, our model achieved a score of 0.6064, securing the first prize on the competition's final results.

CRJan 18, 2019
Taming Distrust in the Decentralized Internet with PIXIU

Yubin Xia, Qingyuan Liu, Cheng Tan et al.

Decentralized Internet is booming. People are fascinated by its promise that users can truly own their data. However, in a decentralized Internet, completing a task usually involves multiple nodes with mutual distrust. Such distrust might eventually become a major obstacle for the growth of the decentralized Internet. In this paper, we analyze the distrust using a simple model and highlight the properties required to faithfully accomplish one task in a decentralized Internet. We also introduce our draft solution -- PIXIU, a framework to mitigate the distrust among different nodes. In PIXIU, we design and utilize trust-λ and decentralized executor to achieve the above-needed properties.