Pascal J. Sager

AI
h-index4
3papers
24citations
Novelty43%
AI Score30

3 Papers

CVJul 8, 2024
The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

Pascal J. Sager, Jan M. Deriu, Benjamin F. Grewe et al.

We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed "nets." Nets are dynamically assembled from overlapping net fragments, which are learned based on statistical regularities in sensory input. This architecture offers robustness to noise, deformation, and generalization to out-of-distribution data, addressing challenges in current vision systems from a novel perspective. We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise. Our findings establish CNA as a promising paradigm for developing neural representations that integrate local feature processing with global structure formation, providing a foundation for future research on invariant object recognition.

IRMay 29, 2025Code
Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking

Pascal J. Sager, Ashwini Kamaraj, Benjamin F. Grewe et al.

We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition, which focuses on retrieving relevant scientific literature for given social media posts. To address this task, we propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking, enabling robust retrieval that bridges the informal-to-formal language gap. Specifically, we combine BM25-based keyword matching with a FAISS vector store using a fine-tuned INF-Retriever-v1 model for dense semantic retrieval. BM25 returns the top 30 candidates, and semantic search yields 100 candidates, which are then merged and re-ranked via a large language model (LLM)-based cross-encoder. Our approach achieves a mean reciprocal rank at 5 (MRR@5) of 76.46% on the development set and 66.43% on the hidden test set, securing the 1st position on the development leaderboard and ranking 3rd on the test leaderboard (out of 31 teams), with a relative performance gap of only 2 percentage points compared to the top-ranked system. We achieve this strong performance by running open-source models locally and without external training data, highlighting the effectiveness of a carefully designed and fine-tuned retrieval pipeline.

AIJan 27, 2025
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

Pascal J. Sager, Benjamin Meyer, Peng Yan et al.

Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices - such as desktops, mobile phones, and web platforms - given instructions in natural language. These agents can automate tasks by controlling software via low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for everyday use. In this survey, we investigate the state-of-the-art, trends, and research gaps in the development of practical ACUs. We provide a comprehensive review of the ACU landscape, introducing a unifying taxonomy spanning three dimensions: (I) the domain perspective, characterizing agent operating contexts; (II) the interaction perspective, describing observation modalities (e.g., screenshots, HTML) and action modalities (e.g., mouse, keyboard, code execution); and (III) the agent perspective, detailing how agents perceive, reason, and learn. We review 87 ACUs and 33 datasets across foundation model-based and classical approaches through this taxonomy. Our analysis identifies six major research gaps: insufficient generalization, inefficient learning, limited planning, low task complexity in benchmarks, non-standardized evaluation, and a disconnect between research and practical conditions. To address these gaps, we advocate for: (a) vision-based observations and low-level control to enhance generalization; (b) adaptive learning beyond static prompting; (c) effective planning and reasoning methods and models; (d) benchmarks that reflect real-world task complexity; (e) standardized evaluation based on task success; (f) aligning agent design with real-world deployment constraints. Together, our taxonomy and analysis establish a foundation for advancing ACU research toward general-purpose agents for robust and scalable computer use.