Aizierjiang Aiersilan

GR
h-index4
11papers
7citations
Novelty43%
AI Score52

11 Papers

62.3LGMay 30Code
Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs

Aizierjiang Aiersilan

We investigate whether open-source LLMs encode a linearly separable truthfulness signal in their hidden states, and at which network depth this signal is strongest. Across three $7$B--$8$B instruction-tuned models (Llama-3.1-8B, Mistral-7B, Qwen2.5-7B) loaded in $4$-bit NF4 quantization, we extract per-layer hidden states on four hallucination benchmarks (TruthfulQA, HaluEval-QA, FEVER, and a controlled synthetic set) and compare four detection approaches: linear and MLP probes, INSIDE EigenScore, self-consistency, and attention entropy. A linear probe on a single mid-network layer achieves $0.904$--$1.000$ AUROC on held-out splits, while sampling-based detectors do not exceed $0.541$ AUROC under the same protocol. The truthfulness signal is approximately linear: MLP probes rarely surpass linear probes by more than $0.01$ AUROC. Peak probing layers fall in a consistent band across model families on natural-language benchmarks -- blocks~$13$--$18$ of~$32$ for Llama and Mistral, and blocks~$19$--$25$ of~$28$ for Qwen. First-block attention entropy provides a complementary signal in knowledge-grounded settings ($0.866$--$0.941$ AUROC on HaluEval-QA) at no additional inference cost. The low discriminability of sampling methods under this protocol reflects a structural mismatch between paired-label evaluation and the information these methods access, rather than an inherent limitation of those methods. Code and data are released for full reproducibility on a single $8$\,GB GPU.

55.2GRMay 2
Investigating Anthropometric Fidelity in SAM 3D Body

Aizierjiang Aiersilan, Ruting Cheng, James Hahn

The release of SAM 3D Body is a recent development in human mesh recovery, demonstrating improved performance in producing clean, topologically coherent meshes from single images. By leveraging the Momentum Human Rig (MHR), it achieves robustness to occlusion and diverse poses. However, our evaluation reveals a specific and consistent limitation: the model struggles to reconstruct detailed anthropometric deviations, particularly in populations exhibiting distinctive morphological alterations such as geriatric muscle atrophy, scoliosis, or pregnancy, even when these features are prominent in the input image. In this paper, we investigate this phenomenon not as a failure of the model's capacity, but as a byproduct of the "perception-distortion trade-off". We posit that the architectural reliance on the low-dimensional parametric MHR representation, combined with semantic-invariant conditioning (DINOv3) and annotation-based alignment, creates a pervasive "regression to the mean" effect. We analyze these mechanisms to understand why individual biological details are smoothed out. Furthermore, we state our contributions by proposing specific, constructive pathways for future work, such as implicit-explicit hybrid representations and Medical-in-the-Loop alignment, to extend the baseline performance of SAM 3D Body into the high-precision medical domain.

RODec 24, 2024Code
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner

Aizierjiang Aiersilan

Motion planning is a crucial component in autonomous driving. State-of-the-art motion planners are trained on meticulously curated datasets, which are not only expensive to annotate but also insufficient in capturing rarely seen critical scenarios. Failing to account for such scenarios poses a significant risk to motion planners and may lead to incidents during testing. An intuitive solution is to manually compose such scenarios by programming and executing a simulator (e.g., CARLA). However, this approach incurs substantial human costs. Motivated by this, we propose an inexpensive method for generating diverse critical traffic scenarios to train more robust motion planners. First, we represent traffic scenarios as scripts, which are then used by the simulator to generate traffic scenarios. Next, we develop a method that accepts user-specified text descriptions, which a Large Language Model translates into scripts using in-context learning. The output scripts are sent to the simulator that produces the corresponding traffic scenarios. As our method can generate abundant safety-critical traffic scenarios, we use them as synthetic training data for motion planners. To demonstrate the value of generated scenarios, we train existing motion planners on our synthetic data, real-world datasets, and a combination of both. Our experiments show that motion planners trained with our data significantly outperform those trained solely on real-world data, showing the usefulness of our synthetic data and the effectiveness of our data generation method. Our source code is available at https://ezharjan.github.io/AutoSceneGen.

27.5CYApr 29
Algorithmic Authority and the Clinical Standard of Care

Aizierjiang Aiersilan

The integration of artificial intelligence into clinical medicine creates a fundamental tension between algorithmic probabilistic reasoning and the experiential intuition of expert physicians; applying Lawrence Lessig's \enquote{Code is Law} framework, I argue that the architecture of clinical AI systems already functions as de facto medical regulation, reshaping liability and the standard of care. Reframing AI \enquote{hallucination} as structurally analogous to well-documented human cognitive failures such as confirmation bias and premature diagnostic closure, I show that both failure modes demand a unified governance response. I therefore propose a dialectical standard of care that treats the integrated AI-physician dyad as the singular responsible diagnostic entity, mandating the synthesis of algorithmic precision with human interpretive authority within robust data governance and patient privacy frameworks.

CVNov 5, 2025
MvBody: Multi-View-Based Hybrid Transformer Using Optical 3D Body Scan for Explainable Cesarean Section Prediction

Ruting Cheng, Boyuan Feng, Yijiang Zheng et al.

Accurately assessing the risk of cesarean section (CS) delivery is critical, especially in settings with limited medical resources, where access to healthcare is often restricted. Early and reliable risk prediction allows better-informed prenatal care decisions and can improve maternal and neonatal outcomes. However, most existing predictive models are tailored for in-hospital use during labor and rely on parameters that are often unavailable in resource-limited or home-based settings. In this study, we conduct a pilot investigation to examine the feasibility of using 3D body shape for CS risk assessment for future applications with more affordable general devices. We propose a novel multi-view-based Transformer network, MvBody, which predicts CS risk using only self-reported medical data and 3D optical body scans obtained between the 31st and 38th weeks of gestation. To enhance training efficiency and model generalizability in data-scarce environments, we incorporate a metric learning loss into the network. Compared to widely used machine learning models and the latest advanced 3D analysis methods, our method demonstrates superior performance, achieving an accuracy of 84.62% and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.724 on the independent test set. To improve transparency and trust in the model's predictions, we apply the Integrated Gradients algorithm to provide theoretically grounded explanations of the model's decision-making process. Our results indicate that pre-pregnancy weight, maternal age, obstetric history, previous CS history, and body shape, particularly around the head and shoulders, are key contributors to CS risk prediction.

39.2CRMay 7
Detecting Verbatim LLM Copy-Paste in Homework

Aizierjiang Aiersilan

Large language models (LLMs) have made fluent essay writing, code drafting, and quiz answering instantly available to students at every level, from secondary school through graduate study. Many educators do not object to LLM use \emph{per~se}; what they need to detect is the case in which a student pastes the assignment prompt into a chatbot and submits the model's reply verbatim, without engaging with the work. Existing post-hoc AI-text detectors remain unreliable and have been shown to penalise non-native English writers, while output-side watermarks require cooperation from the model provider. We propose an alternative that the educator controls directly: an input-side watermark in which an invisible instruction is embedded inside the visible assignment prompt itself. An LLM that ingests the prompt verbatim quietly reads the hidden instruction and writes a tell-tale signature into its reply, exposing the copy-and-paste pathway specifically. We describe SteganoPrompt, a single-page, zero-dependency web tool that encodes an arbitrary printable-ASCII payload into the deprecated Unicode Tags block (\texttt{U+E0000}--\texttt{U+E007F}). The encoded string is visually identical to the original, survives common copy-paste channels (Word, Google Docs, PDF, Markdown, Slack, e-mail, the major learning-management systems), and is reliably tokenized by frontier models. We evaluate compliance across seven LLM families and a representative set of educational content channels. The work is informed by my experience as a graduate teaching assistant for an undergraduate software engineering course at the George Washington University. The tool is released under the MIT licence at \url{https://ezharjan.github.io/SteganoPrompt/}.

5.2MMApr 10
Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis

Aizierjiang Aiersilan, Mohamad Koubeissi

Predicting post-surgical seizure outcomes in pharmacoresistant epilepsy is a clinical challenge. Conventional deep-learning approaches operate on static, single-timepoint pre-operative scans, omitting longitudinal morphological changes. We propose \emph{Neuro-Oracle}, a three-stage framework that: (i) distils pre-to-post-operative MRI changes into a compact 512-dimensional trajectory vector using a 3D Siamese contrastive encoder; (ii) retrieves historically similar surgical trajectories from a population archive via nearest-neighbour search; and (iii) synthesises a natural-language prognosis grounded in the retrieved evidence using a quantized Llama-3-8B reasoning agent. Evaluations are conducted on the public EPISURG dataset ($N{=}268$ longitudinally paired cases) using five-fold stratified cross-validation. Since ground-truth seizure-freedom scores are unavailable, we utilize a clinical proxy label based on the resection type. We acknowledge that the network representations may potentially learn the anatomical features of the resection cavities (i.e., temporal versus non-temporal locations) rather than true prognostic morphometry. Our current evaluation thus serves mainly as a proof-of-concept for the trajectory-aware retrieval architecture. Trajectory-based classifiers achieve AUC values between 0.834 and 0.905, compared with 0.793 for a single-timepoint ResNet-50 baseline. The Neuro-Oracle agent (M5) matches the AUC of purely discriminative trajectory classifiers (0.867) while producing structured justifications with zero observed hallucinations under our audit protocol. A Siamese Diversity Ensemble (M6) of trajectory-space classifiers attains an AUC of 0.905 without language-model overhead.

4.2NIMar 22
OrbitStream: Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields

Aizierjiang Aiersilan, Zhangfei Yang

Adaptive 360° video streaming for teleoperation faces dual challenges: viewport prediction under uncertain gaze patterns and bitrate adaptation over volatile wireless channels. While data-driven and Deep Reinforcement Learning (DRL) methods achieve high Quality of Experience (QoE), their "black-box" nature and reliance on training data can limit deployment in safety-critical systems. To address this, we propose OrbitStream, a training-free framework that combines semantic scene understanding with robust control theory. We formulate viewport prediction as a Gravitational Viewport Prediction (GVP) problem, where semantic objects generate potential fields that attract user gaze. Furthermore, we employ a Saturation-Based Proportional-Derivative (PD) Controller for buffer regulation. On object-rich teleoperation traces, OrbitStream achieves a 94.7\% zero-shot viewport prediction accuracy without user-specific profiling, approaching trajectory-extrapolation baselines ($\sim$98.5\%). Across 3,600 Monte Carlo simulations on diverse network traces, OrbitStream yields a mean QoE of 2.71. It ranks second among 12 evaluated algorithms, close to the top-performing BOLA-E (2.80) while outperforming FastMPC (1.84). The system exhibits an average decision latency of 1.01 ms with minimal rebuffering events. By providing competitive QoE with interpretability and zero training overhead, OrbitStream demonstrates that physics-based control, combined with semantic modeling, offers a practical solution for 360° streaming in teleoperation.

39.0MAMar 19
Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Aizierjiang Aiersilan, Raeli Savitt

Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (\textbf{S}ystem-\textbf{W}ide \textbf{A}ssessment of \textbf{R}isk in \textbf{M}ulti-agent systems), a simulation framework that replaces binary good/bad labels with \emph{soft probabilistic labels} $p = P(v{=}+1) \in [0,1]$, enabling continuous-valued payoff computation, toxicity measurement, and governance intervention. SWARM implements a modular governance engine with configurable levers (transaction taxes, circuit breakers, reputation decay, and random audits) and quantifies their effects through probabilistic metrics including expected toxicity $\mathbb{E}[1{-}p \mid \text{accepted}]$ and quality gap $\mathbb{E}[p \mid \text{accepted}] - \mathbb{E}[p \mid \text{rejected}]$. Across seven scenarios with five-seed replication, strict governance reduces welfare by over 40\% without improving safety. In parallel, aggressively internalizing system externalities collapses total welfare from a baseline of $+262$ down to $-67$, while toxicity remains invariant. Circuit breakers require careful calibration; overly restrictive thresholds severely diminish system value, whereas an optimal threshold balances moderate welfare with minimized toxicity. Companion experiments show soft metrics detect proxy gaming by self-optimizing agents passing conventional binary evaluations. This basic governance layer applies to live LLM-backed agents (Concordia entities, Claude, GPT-4o Mini) without modification. Results show distributional safety requires \emph{continuous} risk metrics and governance lever calibration involves quantifiable safety-welfare tradeoffs. Source code and project resources are publicly available at https://www.swarm-ai.org/.

SEJan 2
The Vibe-Check Protocol: Quantifying Cognitive Offloading in AI Programming

Aizierjiang Aiersilan

The integration of Large Language Models (LLMs) into software engineering education has driven the emergence of ``Vibe Coding,'' a paradigm where developers articulate high-level intent through natural language and delegate implementation to AI agents. While proponents argue this approach modernizes pedagogy by emphasizing conceptual design over syntactic memorization, accumulating empirical evidence raises concerns regarding skill retention and deep conceptual understanding. This paper proposes a theoretical framework to investigate the research question: \textit{Is Vibe Coding a better way to learn software engineering?} We posit a divergence in student outcomes between those leveraging AI for acceleration versus those using it for cognitive offloading. To evaluate these educational trade-offs, we propose the \textbf{Vibe-Check Protocol (VCP)}, a systematic benchmarking framework incorporating three quantitative metrics: the \textit{Cold Start Refactor} ($M_{CSR}$) for modeling skill decay; \textit{Hallucination Trap Detection} ($M_{HT}$) based on signal detection theory to evaluate error identification; and the \textit{Explainability Gap} ($E_{gap}$) for quantifying the divergence between code complexity and conceptual comprehension. Through controlled comparisons, VCP aims to provide a quantitative basis for educators to determine the optimal pedagogical boundary: identifying contexts where Vibe Coding fosters genuine mastery and contexts where it introduces hidden technical debt and superficial competence.

GRDec 17, 2025
Representations of 3D Rotations: Mathematical Foundations and Comparative Analysis

Aizierjiang Aiersilan, Haochen Liu, James Hahn

Rotation representations are foundational in fields such as computer graphics, robotics, and machine learning, where precise and efficient modeling of 3D orientations is critical. This paper comprehensively investigates diverse representations of the special orthogonal group $SO(3)$, such as Euler angles, axis-angle vectors, quaternions, rotation matrices, exponential maps, and emerging continuous and probabilistic methods, evaluating their mathematical formulations, continuity, susceptibility to gimbal lock, computational efficiency, storage requirements, interpolation properties, and composition operations, while integrating detailed algebraic insights with practical applications in fields like animation, pose estimation, inertial navigation, 3D shape registration, and neural networks. Empirical evidence highlights quaternions' dominance due to their compactness and computational efficiency, while alternatives like 6D continuous representations and matrix Fisher distributions provide enhanced continuity and uncertainty modeling. Future research could explore hybrid methods and thorough large-scale evaluations to help build a solid foundation for improving rotation representation techniques.