Benjamin Meyer

AI
h-index8
3papers
54citations
Novelty25%
AI Score28

3 Papers

AIJan 27, 2025
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

Pascal J. Sager, Benjamin Meyer, Peng Yan et al.

Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices - such as desktops, mobile phones, and web platforms - given instructions in natural language. These agents can automate tasks by controlling software via low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for everyday use. In this survey, we investigate the state-of-the-art, trends, and research gaps in the development of practical ACUs. We provide a comprehensive review of the ACU landscape, introducing a unifying taxonomy spanning three dimensions: (I) the domain perspective, characterizing agent operating contexts; (II) the interaction perspective, describing observation modalities (e.g., screenshots, HTML) and action modalities (e.g., mouse, keyboard, code execution); and (III) the agent perspective, detailing how agents perceive, reason, and learn. We review 87 ACUs and 33 datasets across foundation model-based and classical approaches through this taxonomy. Our analysis identifies six major research gaps: insufficient generalization, inefficient learning, limited planning, low task complexity in benchmarks, non-standardized evaluation, and a disconnect between research and practical conditions. To address these gaps, we advocate for: (a) vision-based observations and low-level control to enhance generalization; (b) adaptive learning beyond static prompting; (c) effective planning and reasoning methods and models; (d) benchmarks that reflect real-world task complexity; (e) standardized evaluation based on task success; (f) aligning agent design with real-world deployment constraints. Together, our taxonomy and analysis establish a foundation for advancing ACU research toward general-purpose agents for robust and scalable computer use.

CVJul 11, 2025
A document is worth a structured record: Principled inductive bias design for document recognition

Benjamin Meyer, Lukas Tuggener, Sascha Hänzi et al.

Many document types use intrinsic, convention-driven structures that serve to encode precise and structured information, such as the conventions governing engineering drawings. However, state-of-the-art approaches treat document recognition as a mere computer vision problem, neglecting these underlying document-type-specific structural properties, making them dependent on sub-optimal heuristic post-processing and rendering many less frequent or more complicated document types inaccessible to modern document recognition. We suggest a novel perspective that frames document recognition as a transcription task from a document to a record. This implies a natural grouping of documents based on the intrinsic structure inherent in their transcription, where related document types can be treated (and learned) similarly. We propose a method to design structure-specific inductive biases for the underlying machine-learned end-to-end document recognition systems, and a respective base transformer architecture that we successfully adapt to different structures. We demonstrate the effectiveness of the so-found inductive biases in extensive experiments with progressively complex record structures from monophonic sheet music, shape drawings, and simplified engineering drawings. By integrating an inductive bias for unrestricted graph structures, we train the first-ever successful end-to-end model to transcribe engineering drawings to their inherently interlinked information. Our approach is relevant to inform the design of document recognition systems for document types that are less well understood than standard OCR, OMR, etc., and serves as a guide to unify the design of future document foundation models.

CRMar 13, 2020
Methods for Actors in the Electric Power System to Prevent, Detect and React to ICT Attacks and Failures

Dennis van der Velde, Martin Henze, Philipp Kathmann et al.

The fundamental changes in power supply and increasing decentralization require more active grid operation and an increased integration of ICT at all power system actors. This trend raises complexity and increasingly leads to interactions between primary grid operation and ICT as well as different power system actors. For example, virtual power plants control various assets in the distribution grid via ICT to jointly market existing flexibilities. Failures of ICT or targeted attacks can thus have serious effects on security of supply and system stability. This paper presents a holistic approach to providing methods specifically for actors in the power system for prevention, detection, and reaction to ICT attacks and failures. The focus of our measures are solutions for ICT monitoring, systems for the detection of ICT attacks and intrusions in the process network, and the provision of actionable guidelines as well as a practice environment for the response to potential ICT security incidents.