Michael Fink

AI
h-index117
14papers
8,285citations
Novelty45%
AI Score44

14 Papers

CLJul 25, 2022
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Deborah Cohen, Moonkyung Ryu, Yinlam Chow et al.

Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant.

CLDec 11, 2025
The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Aileen Cheng, Alon Jacovi, Amir Globerson et al.

We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate text across diverse scenarios. The suite provides a holistic measure of factuality by aggregating the performance of models on four distinct sub-leaderboards: (1) FACTS Multimodal, which measures the factuality of responses to image-based questions; (2) FACTS Parametric, which assesses models' world knowledge by answering closed-book factoid questions from internal parameters; (3) FACTS Search, which evaluates factuality in information-seeking scenarios, where the model must use a search API; and (4) FACTS Grounding (v2), which evaluates whether long-form responses are grounded in provided documents, featuring significantly improved judge models. Each sub-leaderboard employs automated judge models to score model responses, and the final suite score is an average of the four components, designed to provide a robust and balanced assessment of a model's overall factuality. The FACTS Leaderboard Suite will be actively maintained, containing both public and private splits to allow for external participation while guarding its integrity. It can be found at https://www.kaggle.com/benchmarks/google/facts .

CLMar 8, 2024
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini Team, Petko Georgiev, Ving Ian Lei et al. · deepmind, mila

In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

CLDec 19, 2023
Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud et al.

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

CVOct 22, 2018
Baseline Detection in Historical Documents using Convolutional U-Nets

Michael Fink, Thomas Layer, Georg Mackenbrock et al.

Baseline detection is still a challenging task for heterogeneous collections of historical documents. We present a novel approach to baseline extraction in such settings, turning out the winning entry to the ICDAR 2017 Competition on Baseline detection (cBAD). It utilizes deep convolutional nets (CNNs) for both, the actual extraction of baselines, as well as for a simple form of layout analysis in a pre-processing step. To the best of our knowledge it is the first CNN-based system for baseline extraction applying a U-net architecture and sliding window detection, profiting from a high local accuracy of the candidate lines extracted. Final baseline post-processing complements our approach, compensating for inaccuracies mainly due to missing context information during sliding window detection. We experimentally evaluate the components of our system individually on the cBAD dataset. Moreover, we investigate how it generalizes to different data by means of the dataset used for the baseline extraction task of the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts (HisDoc). A comparison with the results reported for HisDoc shows that it also outperforms the contestants of the latter.

AIJul 6, 2015
A model building framework for Answer Set Programming with external computations

Thomas Eiter, Michael Fink, Giovambattista Ianni et al.

As software systems are getting increasingly connected, there is a need for equipping nonmonotonic logic programs with access to external sources that are possibly remote and may contain information in heterogeneous formats. To cater for this need, HEX programs were designed as a generalization of answer set programs with an API style interface that allows to access arbitrary external sources, providing great flexibility. Efficient evaluation of such programs however is challenging, and it requires to interleave external computation and model building; to decide when to switch between these tasks is difficult, and existing approaches have limited scalability in many real-world application scenarios. We present a new approach for the evaluation of logic programs with external source access, which is based on a configurable framework for dividing the non-ground program into possibly overlapping smaller parts called evaluation units. The latter will be processed by interleaving external evaluation and model building using an evaluation graph and a model graph, respectively, and by combining intermediate results. Experiments with our prototype implementation show a significant improvement compared to previous approaches. While designed for HEX-programs, the new evaluation approach may be deployed to related rule-based formalisms as well.

AIMay 20, 2015
Towards Ideal Semantics for Analyzing Stream Reasoning

Harald Beck, Minh Dao-Tran, Thomas Eiter et al.

The rise of smart applications has drawn interest to logical reasoning over data streams. Recently, different query languages and stream processing/reasoning engines were proposed in different communities. However, due to a lack of theoretical foundations, the expressivity and semantics of these diverse approaches are given only informally. Towards clear specifications and means for analytic study, a formal framework is needed to define their semantics in precise terms. To this end, we present a first step towards an ideal semantics that allows for exact descriptions and comparisons of stream reasoning systems.

AIDec 30, 2014
Workshop Notes of the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic (ARCOE-Logic 2014)

Michael Fink, Martin Homola, Alessandra Mileo

ARCOE-Logic 2014, the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic, was held in co-location with the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) on November 25, 2014 in Linköping, Sweden. These notes contain the five papers which were accepted and presented at the workshop.

AISep 25, 2014
Causal Graph Justifications of Logic Programs

Pedro Cabalar, Jorge Fandinno, Michael Fink

In this work we propose a multi-valued extension of logic programs under the stable models semantics where each true atom in a model is associated with a set of justifications. These justifications are expressed in terms of causal graphs formed by rule labels and edges that represent their application ordering. For positive programs, we show that the causal justifications obtained for a given atom have a direct correspon- dence to (relevant) syntactic proofs of that atom using the program rules involved in the graphs. The most interesting contribution is that this causal information is obtained in a purely semantic way, by algebraic op- erations (product, sum and application) on a lattice of causal values whose ordering relation expresses when a justification is stronger than another. Finally, for programs with negation, we define the concept of causal stable model by introducing an analogous transformation to Gelfond and Lifschitz's program reduct. As a result, default negation behaves as "absence of proof" and no justification is derived from negative liter

AIDec 28, 2013
Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2013), 6th International Workshop, August 25, 2013, Istanbul, Turkey

Michael Fink, Yuliya Lierler

This volume contains the papers presented at the sixth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2013) held on August 25th, 2013 in Istanbul, co-located with the 29th International Conference on Logic Programming (ICLP 2013). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.

AIJan 10, 2013
Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2012), 5th International Workshop, September 4, 2012, Budapest, Hungary

Michael Fink, Yuliya Lierler

This volume contains the papers presented at the fifth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2012) held on September 4th, 2012 in Budapest, co-located with the 28th International Conference on Logic Programming (ICLP 2012). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.

LOJan 8, 2013
Eliminating Unfounded Set Checking for HEX-Programs

Thomas Eiter, Michael Fink, Thomas Krennwallner et al.

HEX-programs are an extension of the Answer Set Programming (ASP) paradigm incorporating external means of computation into the declarative programming language through so-called external atoms. Their semantics is defined in terms of minimal models of the Faber-Leone-Pfeifer (FLP) reduct. Developing native solvers for HEX-programs based on an appropriate notion of unfounded sets has been subject to recent research for reasons of efficiency. Although this has lead to an improvement over naive minimality checking using the FLP reduct, testing for foundedness remains a computationally expensive task. In this work we improve on HEX-program evaluation in this respect by identifying a syntactic class of programs, that can be efficiently recognized and allows to entirely skip the foundedness check. Moreover, we develop criteria for decomposing a program into components, such that the search for unfounded sets can be restricted. Observing that our results apply to many HEX-program applications provides analytic evidence for the significance and effectiveness of our approach, which is complemented by a brief discussion of preliminary experimental validation.

AIOct 5, 2012
Conflict-driven ASP Solving with External Sources

Thomas Eiter, Michael Fink, Thomas Krennwallner et al.

Answer Set Programming (ASP) is a well-known problem solving approach based on nonmonotonic logic programs and efficient solvers. To enable access to external information, HEX-programs extend programs with external atoms, which allow for a bidirectional communication between the logic program and external sources of computation (e.g., description logic reasoners and Web resources). Current solvers evaluate HEX-programs by a translation to ASP itself, in which values of external atoms are guessed and verified after the ordinary answer set computation. This elegant approach does not scale with the number of external accesses in general, in particular in presence of nondeterminism (which is instrumental for ASP). In this paper, we present a novel, native algorithm for evaluating HEX-programs which uses learning techniques. In particular, we extend conflict-driven ASP solving techniques, which prevent the solver from running into the same conflict again, from ordinary to HEX-programs. We show how to gain additional knowledge from external source evaluations and how to use it in a conflict-driven algorithm. We first target the uninformed case, i.e., when we have no extra information on external sources, and then extend our approach to the case where additional meta-information is available. Experiments show that learning from external sources can significantly decrease both the runtime and the number of considered candidate compatible sets.