Dylan Lu

2papers

2 Papers

81.1AIMay 11
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

Gurusha Juneja, Dylan Lu, Saaket Agashe et al.

Theory of Mind (ToM), the ability to track others epistemic state, makes humans efficient collaborators. AI agents need the same capacity in multi agent settings, yet existing benchmarks mostly test literal ToM by asking direct belief questions. The ability act optimally on implicit beliefs in embodied environments, called functional ToM, remains largely untested. We introduce EnactToM, an evolving benchmark of 300 embodied multi-agent tasks set in a 3D household with partial observability, private information, and constrained communication. Each task is formally verified for solvability and required epistemic depth, and new tasks are generated increase difficulty as models improve. On the hard split, all seven evaluated frontier models score 0.0% Pass^3 on functional task completion, while averaging 45.0% on literal belief probes. Manual analysis traces 93% of sampled failures to epistemic coordination breakdowns such as withheld information, ignored partner constraints, and misallocated messages, providing a concrete target for future work.

SIFeb 24, 2021
Community Detection in Weighted Multilayer Networks with Ambient Noise

Mark He, Dylan Lu, Jason Xu et al.

We introduce a novel model for multilayer weighted networks that accounts for global noise in addition to local signals. The model is similar to a multilayer stochastic blockmodel (SBM), but the key difference is that between-block interactions independent across layers are common for the whole system, which we call ambient noise. A single block is also characterized by these fixed ambient parameters to represent members that do not belong anywhere else. This approach allows simultaneous clustering and typologizing of blocks into signal or noise in order to better understand their roles in the overall system, which is not accounted for by existing Blockmodels. We employ a novel application of hierarchical variational inference to jointly detect and differentiate types of blocks. We call this model for multilayer weighted networks the Stochastic Block (with) Ambient Noise Model (SBANM) and develop an associated community detection algorithm. We apply this method to subjects in the Philadelphia Neurodevelopmental Cohort to discover communities of subjects with co-occurrent psychopathologies in relation to psychosis.