AIFeb 26
ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-MakingYusuke Watanabe, Yohei Kobashi, Takeshi Kojima et al.
Clinical decisions are often required under incomplete information. Clinical experts must identify whether available information is sufficient for judgment, as both premature conclusion and unnecessary abstention can compromise patient safety. To evaluate this capability of large language models (LLMs), we developed ClinDet-Bench, a benchmark based on clinical scoring systems that decomposes incomplete-information scenarios into determinable and undeterminable conditions. Identifying determinability requires considering all hypotheses about missing information, including unlikely ones, and verifying whether the conclusion holds across them. We find that recent LLMs fail to identify determinability under incomplete information, producing both premature judgments and excessive abstention, despite correctly explaining the underlying scoring knowledge and performing well under complete information. These findings suggest that existing benchmarks are insufficient to evaluate the safety of LLMs in clinical settings. ClinDet-Bench provides a framework for evaluating determinability recognition, leading to appropriate abstention, with potential applicability to medicine and other high-stakes domains, and is publicly available.
CLMar 26, 2017
Question Answering from Unstructured Text by Retrieval and ComprehensionYusuke Watanabe, Bhuwan Dhingra, Ruslan Salakhutdinov
Open domain Question Answering (QA) systems must interact with external knowledge sources, such as web pages, to find relevant information. Information sources like Wikipedia, however, are not well structured and difficult to utilize in comparison with Knowledge Bases (KBs). In this work we present a two-step approach to question answering from unstructured text, consisting of a retrieval step and a comprehension step. For comprehension, we present an RNN based attention model with a novel mixture mechanism for selecting answers from either retrieved articles or a fixed vocabulary. For retrieval we introduce a hand-crafted model and a neural model for ranking relevant articles. We achieve state-of-the-art performance on W IKI M OVIES dataset, reducing the error by 40%. Our experimental results further demonstrate the importance of each of the introduced components.
CLJul 1, 2016
Domain Adaptation for Neural Networks by Parameter AugmentationYusuke Watanabe, Kazuma Hashimoto, Yoshimasa Tsuruoka
We propose a simple domain adaptation method for neural networks in a supervised setting. Supervised domain adaptation is a way of improving the generalization performance on the target domain by using the source domain dataset, assuming that both of the datasets are labeled. Recently, recurrent neural networks have been shown to be successful on a variety of NLP tasks such as caption generation; however, the existing domain adaptation techniques are limited to (1) tune the model parameters by the target dataset after the training by the source dataset, or (2) design the network to have dual output, one for the source domain and the other for the target domain. Reformulating the idea of the domain adaptation technique proposed by Daume (2007), we propose a simple domain adaptation method, which can be applied to neural networks trained with a cross-entropy loss. On captioning datasets, we show performance improvements over other domain adaptation methods.
DSMay 2, 2010
Belief Propagation and Loop Calculus for the Permanent of a Non-Negative MatrixYusuke Watanabe, Michael Chertkov
We consider computation of permanent of a positive $(N\times N)$ non-negative matrix, $P=(P_i^j|i,j=1,\cdots,N)$, or equivalently the problem of weighted counting of the perfect matchings over the complete bipartite graph $K_{N,N}$. The problem is known to be of likely exponential complexity. Stated as the partition function $Z$ of a graphical model, the problem allows exact Loop Calculus representation [Chertkov, Chernyak '06] in terms of an interior minimum of the Bethe Free Energy functional over non-integer doubly stochastic matrix of marginal beliefs, $β=(β_i^j|i,j=1,\cdots,N)$, also correspondent to a fixed point of the iterative message-passing algorithm of the Belief Propagation (BP) type. Our main result is an explicit expression of the exact partition function (permanent) in terms of the matrix of BP marginals, $β$, as $Z=\mbox{Perm}(P)=Z_{BP} \mbox{Perm}(β_i^j(1-β_i^j))/\prod_{i,j}(1-β_i^j)$, where $Z_{BP}$ is the BP expression for the permanent stated explicitly in terms if $β$. We give two derivations of the formula, a direct one based on the Bethe Free Energy and an alternative one combining the Ihara graph-$ζ$ function and the Loop Calculus approaches. Assuming that the matrix $β$ of the Belief Propagation marginals is calculated, we provide two lower bounds and one upper-bound to estimate the multiplicative term. Two complementary lower bounds are based on the Gurvits-van der Waerden theorem and on a relation between the modified permanent and determinant respectively.