Elaine Wong

NI
h-index14
6papers
Novelty39%
AI Score46

6 Papers

81.9AIMay 11Code
Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights

Wenbo Chen, Veena Padmanabhan, Tootiya Giyahchi et al.

Hallucination, broadly referring to unfaithful, fabricated, or inconsistent content generated by LLMs, has wide-ranging implications. Therefore, a large body of effort has been devoted to detecting LLM hallucinations, as well as designing benchmark datasets for evaluating these detectors. In this work, we first establish a desiderata of properties for hallucination detection benchmarks (HDBs) to exhibit for effective evaluation. A critical look at existing HDBs through the lens of our desiderata reveals that none of them exhibits all the properties. We identify two largest gaps: (1) RAG-based grounded benchmarks with long context are severely lacking (partly because length impedes human annotation); and (2) Existing benchmarks do not make available realistic label noise for stress-testing detectors although real-world use-cases often grapple with label noise due to human or automated/weak annotation. To close these gaps, we build and open-source a new RAG-based HDB called T RIVIA+ that underwent a rigorous human annotation process. Notably, our benchmark exhibits all desirable properties including (1) T RIVIA+ contains samples with the longest context in the literature; and (2) we design and share four sets of noisy labels with different, both sample-dependent and sampleindependent, noise schemes. Finally, we perform experiments on RAG-based HDBs, including our T RIVIA+, using popular SOTA detectors that reveal new insights: (i) ample room remains for current detectors to reach the performance ceiling on RAG-based HDBs, (ii) the basic LLM-as-a-Judge baseline performs competitively, and (iii) label noise hinders detection performance. We expect that our findings, along with our proposed benchmark 1 , will motivate and foster needed research on hallucination detection for RAG-based tasks.

14.0QUANT-PHMay 12
Classic and Quantum Task-Based Intelligent Runtime for QIRs Running on Multiple QPUs

Narasinga Rao Miniskar, Elaine Wong, Vicente Leyton-Ortega et al.

High-performance computing systems are rapidly evolving into heterogeneous platforms that fuse quantum accelerators with traditional classical processing units (CPUs) and graphical processing units (GPUs). This convergence calls for runtimes capable of managing both classical and quantum workloads in a unified manner. We introduce an intelligent, task-based runtime that marries the Intelligent RuntIme System (IRIS) asynchronous scheduler with a quantum programming stack through the Quantum Intermediate Representation Execution Engine (QIR-EE). Our design allows programs written in the quantum intermediate representation (QIR) to be dispatched concurrently to a variety of back-ends, including multiple quantum simulators and nascent quantum processors, enabling genuine hybrid execution on a single node. To illustrate its practicality, we partition a 4-qubit and 20-qubit circuit into three sub-circuits using quantum circuit cutting via the QCut library. Each sub-circuit is simulated independently by the QIR-EE driver within IRIS, after which a classical post-processing step merges the simulation results to recover the outcome of the original full-circuit computation. This case study demonstrates how finer task granularity can enable the parallel execution and lower the simulation burden per quantum task while preserving overall accuracy, highlighting the feasibility of our hybrid approach.

41.1QUANT-PHMar 26
Uncertainty Quantification for Quantum Computing

Ryan Bennink, Olena Burkovska, Konstantin Pieper et al.

This review is designed to introduce mathematicians and computational scientists to quantum computing (QC) through the lens of uncertainty quantification (UQ) by presenting a mathematically rigorous and accessible narrative for understanding how noise and intrinsic randomness shape quantum computational outcomes in the language of mathematics. By grounding quantum computation in statistical inference, we highlight how mathematical tools such as probabilistic modeling, stochastic analysis, Bayesian inference, and sensitivity analysis, can directly address error propagation and reliability challenges in today's quantum devices. We also connect these methods to key scientific priorities in the field, including scalable uncertainty-aware algorithms and characterization of correlated errors. The purpose is to narrow the conceptual divide between applied mathematics, scientific computing and quantum information sciences, demonstrating how mathematically rooted UQ methodologies can guide validation, error mitigation, and principled algorithm design for emerging quantum technologies, in order to address challenges and opportunities present in modern-day quantum high performance and fault-tolerant quantum computing paradigms.

NIJul 21, 2025
Enabling Immersive XR Collaborations over FTTR Networks (Invited)

Sourav Mondal, Elaine Wong

Fiber-To-The-Room is a potential solution to achieve in-premise extended reality collaborations. This paper explores predictive bandwidth allocation and seamless handover schemes over FTTR, showing high-quality immersive experience for in-premise collaborations can be achieved. \c{opyright} 2025 The Author(s).

NIJul 21, 2025
User Head Movement-Predictive XR in Immersive H2M Collaborations over Future Enterprise Networks

Sourav Mondal, Elaine Wong

The evolution towards future generation of mobile systems and fixed wireless networks is primarily driven by the urgency to support high-bandwidth and low-latency services across various vertical sectors. This endeavor is fueled by smartphones as well as technologies like industrial internet of things, extended reality (XR), and human-to-machine (H2M) collaborations for fostering industrial and social revolutions like Industry 4.0/5.0 and Society 5.0. To ensure an ideal immersive experience and avoid cyber-sickness for users in all the aforementioned usage scenarios, it is typically challenging to synchronize XR content from a remote machine to a human collaborator according to their head movements across a large geographic span in real-time over communication networks. Thus, we propose a novel H2M collaboration scheme where the human's head movements are predicted ahead with highly accurate models like bidirectional long short-term memory networks to orient the machine's camera in advance. We validate that XR frame size varies in accordance with the human's head movements and predict the corresponding bandwidth requirements from the machine's camera to propose a human-machine coordinated dynamic bandwidth allocation (HMC-DBA) scheme. Through extensive simulations, we show that end-to-end latency and jitter requirements of XR frames are satisfied with much lower bandwidth consumption over enterprise networks like Fiber-To-The-Room-Business. Furthermore, we show that better efficiency in network resource utilization is achieved by employing our proposed HMC-DBA over state-of-the-art schemes.

NIFeb 27, 2025
Scalable Coordinated Learning for H2M/R Applications over Optical Access Networks (Invited)

Sourav Mondal, Elaine Wong

One of the primary research interests adhering to next-generation fiber-wireless access networks is human-to-machine/robot (H2M/R) collaborative communications facilitating Industry 5.0. This paper discusses scalable H2M/R communications across large geographical distances that also allow rapid onboarding of new machines/robots as $\sim72\%$ training time is saved through global-local coordinated learning.