Manfred Hauswirth

CV
h-index5
14papers
71citations
Novelty30%
AI Score43

14 Papers

CVSep 24, 2023
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph

Jicheng Yuan, Anh Le-Tuan, Manh Nguyen-Duc et al.

The availability of vast amounts of visual data with heterogeneous features is a key factor for developing, testing, and benchmarking of new computer vision (CV) algorithms and architectures. Most visual datasets are created and curated for specific tasks or with limited image data distribution for very specific situations, and there is no unified approach to manage and access them across diverse sources, tasks, and taxonomies. This not only creates unnecessary overheads when building robust visual recognition systems, but also introduces biases into learning systems and limits the capabilities of data-centric AI. To address these problems, we propose the Vision Knowledge Graph (VisionKG), a novel resource that interlinks, organizes and manages visual datasets via knowledge graphs and Semantic Web technologies. It can serve as a unified framework facilitating simple access and querying of state-of-the-art visual datasets, regardless of their heterogeneous formats and taxonomies. One of the key differences between our approach and existing methods is that ours is knowledge-based rather than metadatabased. It enhances the enrichment of the semantics at both image and instance levels and offers various data retrieval and exploratory services via SPARQL. VisionKG currently contains 519 million RDF triples that describe approximately 40 million entities, and are accessible at https://vision.semkg.org and through APIs. With the integration of 30 datasets and four popular CV tasks, we demonstrate its usefulness across various scenarios when working with CV pipelines.

ROMar 24
ROSCell: A ROS2-Based Framework for Automated Formation and Orchestration of Multi-Robot Systems

Jiangtao Shuai, Marvin Carl May, Sonja Schimmler et al.

Modern manufacturing under High-Mix-Low-Volume requirements increasingly relies on flexible and adaptive matrix production systems, which depend on interconnected heterogeneous devices and rapid task reconfiguration. To address these needs, we present ROSCell, a ROS2-based framework that enables the flexible formation and management of a computing continuum across various devices. ROSCell allows users to package existing robotic software as deployable skills and, with simple requests, assemble isolated cells, automatically deploy skill instances, and coordinate their communication to meet task objectives. It provides a scalable and low-overhead foundation for adaptive multi-robot computing in dynamic production environments. Experimental results show that, in the idle state, ROSCell substantially reduces CPU, memory, and network overhead compared to K3s-based solutions on edge devices, highlighting its energy efficiency and cost-effectiveness for large-scale deployment in production settings. The source code, examples, and documentation will be provided on Github.

CVJul 28, 2025Code
Collaborative Perceiver: Elevating Vision-based 3D Object Detection via Local Density-Aware Spatial Occupancy

Jicheng Yuan, Manh Nguyen Duc, Qian Liu et al.

Vision-based bird's-eye-view (BEV) 3D object detection has advanced significantly in autonomous driving by offering cost-effectiveness and rich contextual information. However, existing methods often construct BEV representations by collapsing extracted object features, neglecting intrinsic environmental contexts, such as roads and pavements. This hinders detectors from comprehensively perceiving the characteristics of the physical world. To alleviate this, we introduce a multi-task learning framework, Collaborative Perceiver (CoP), that leverages spatial occupancy as auxiliary information to mine consistent structural and conceptual similarities shared between 3D object detection and occupancy prediction tasks, bridging gaps in spatial representations and feature refinement. To this end, we first propose a pipeline to generate dense occupancy ground truths incorporating local density information (LDO) for reconstructing detailed environmental information. Next, we employ a voxel-height-guided sampling (VHS) strategy to distill fine-grained local features according to distinct object properties. Furthermore, we develop a global-local collaborative feature fusion (CFF) module that seamlessly integrates complementary knowledge between both tasks, thus composing more robust BEV representations. Extensive experiments on the nuScenes benchmark demonstrate that CoP outperforms existing vision-based frameworks, achieving 49.5\% mAP and 59.2\% NDS on the test set. Code and supplementary materials are available at this link https://github.com/jichengyuan/Collaborative-Perceiver.

CVApr 2, 2024Code
Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection

Jicheng Yuan, Anh Le-Tuan, Manfred Hauswirth et al.

Unsupervised Domain Adaptation (UDA) has shown significant advancements in object detection under well-lit conditions; however, its performance degrades notably in low-visibility scenarios, especially at night, posing challenges not only for its adaptability in low signal-to-noise ratio (SNR) conditions but also for the reliability and efficiency of automated vehicles. To address this problem, we propose a \textbf{Co}operative \textbf{S}tudents (\textbf{CoS}) framework that innovatively employs global-local transformations (GLT) and a proxy-based target consistency (PTC) mechanism to capture the spatial consistency in day- and night-time scenarios effectively, and thus bridge the significant domain shift across contexts. Building upon this, we further devise an adaptive IoU-informed thresholding (AIT) module to gradually avoid overlooking potential true positives and enrich the latent information in the target domain. Comprehensive experiments show that CoS essentially enhanced UDA performance in low-visibility conditions and surpasses current state-of-the-art techniques, achieving an increase in mAP of 3.0\%, 1.9\%, and 2.5\% on BDD100K, SHIFT, and ACDC datasets, respectively. Code is available at https://github.com/jichengyuan/Cooperitive_Students.

IRMay 6, 2020Code
Piveau: A Large-scale Open Data Management Platform based on Semantic Web Technologies

Fabian Kirstein, Kyriakos Stefanidis, Benjamin Dittwald et al.

The publication and (re)utilization of Open Data is still facing multiple barriers on technical, organizational and legal levels. This includes limitations in interfaces, search capabilities, provision of quality information and the lack of definite standards and implementation guidelines. Many Semantic Web specifications and technologies are specifically designed to address the publication of data on the web. In addition, many official publication bodies encourage and foster the development of Open Data standards based on Semantic Web principles. However, no existing solution for managing Open Data takes full advantage of these possibilities and benfits. In this paper, we present our solution "Piveau", a fully-fledged Open Data management solution, based on Semantic Web technologies. It harnesses a variety of standards, like RDF, DCAT, DQV, and SKOS, to overcome the barriers in Open Data publication. The solution puts a strong focus on assuring data quality and scalability. We give a detailed description of the underlying, highly scalable, service-oriented architecture, how we integrated the aforementioned standards, and used a triplestore as our primary database. We have evaluated our work in a comprehensive feature comparison to established solutions and through a practical application in a production environment, the European Data Portal. Our solution is available as Open Source.

DSNov 21, 2024
Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices

Ali Ganbarov, Jicheng Yuan, Anh Le-Tuan et al.

In this paper, we present an experimental comparison of various graph-based approximate nearest neighbor (ANN) search algorithms deployed on edge devices for real-time nearest neighbor search applications, such as smart city infrastructure and autonomous vehicles. To the best of our knowledge, this specific comparative analysis has not been previously conducted. While existing research has explored graph-based ANN algorithms, it has often been limited to single-threaded implementations on standard commodity hardware. Our study leverages the full computational and storage capabilities of edge devices, incorporating additional metrics such as insertion and deletion latency of new vectors and power consumption. This comprehensive evaluation aims to provide valuable insights into the performance and suitability of these algorithms for edge-based real-time tracking systems enhanced by nearest-neighbor search algorithms.

CVJul 22, 2025
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.

In this paper, we propose a new dataset, ReasonVQA, for the Visual Question Answering (VQA) task. Our dataset is automatically integrated with structured encyclopedic knowledge and constructed using a low-cost framework, which is capable of generating complex, multi-hop questions. We evaluated state-of-the-art VQA models on ReasonVQA, and the empirical results demonstrate that ReasonVQA poses significant challenges to these models, highlighting its potential for benchmarking and advancing the field of VQA. Additionally, our dataset can be easily scaled with respect to input images; the current version surpasses the largest existing datasets requiring external knowledge by more than an order of magnitude.

CVNov 28, 2024
Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras

Jicheng Yuan, Anh Le-Tuan, Ali Ganbarov et al.

Recently, deep learning has experienced rapid expansion, contributing significantly to the progress of supervised learning methodologies. However, acquiring labeled data in real-world settings can be costly, labor-intensive, and sometimes scarce. This challenge inhibits the extensive use of neural networks for practical tasks due to the impractical nature of labeling vast datasets for every individual application. To tackle this, semi-supervised learning (SSL) offers a promising solution by using both labeled and unlabeled data to train object detectors, potentially enhancing detection efficacy and reducing annotation costs. Nevertheless, SSL faces several challenges, including pseudo-target inconsistencies, disharmony between classification and regression tasks, and efficient use of abundant unlabeled data, especially on edge devices, such as roadside cameras. Thus, we developed a teacher-student-based SSL framework, Co-Learning, which employs mutual learning and annotation-alignment strategies to adeptly navigate these complexities and achieves comparable performance as fully-supervised solutions using 10\% labeled data.

RONov 27, 2024
A comparison of extended object tracking with multi-modal sensors in indoor environment

Jiangtao Shuai, Martin Baerveldt, Manh Nguyen-Duc et al.

This paper presents a preliminary study of an efficient object tracking approach, comparing the performance of two different 3D point cloud sensory sources: LiDAR and stereo cameras, which have significant price differences. In this preliminary work, we focus on single object tracking. We first developed a fast heuristic object detector that utilizes prior information about the environment and target. The resulting target points are subsequently fed into an extended object tracking framework, where the target shape is parameterized using a star-convex hypersurface model. Experimental results show that our object tracking method using a stereo camera achieves performance similar to that of a LiDAR sensor, with a cost difference of more than tenfold.

DBFeb 15, 2022
CQELS 2.0: Towards A Unified Framework for Semantic Stream Fusion

Anh Le-Tuan, Manh Nguyen-Duc, Chien-Quang Le et al.

We present CQELS 2.0, the second version of Continuous Query Evaluation over Linked Streams. CQELS 2.0 is a platform-agnostic federated execution framework towards semantic stream fusion. In this version, we introduce a novel neural-symbolic stream reasoning component that enables specifying deep neural network (DNN) based data fusion pipelines via logic rules with learnable probabilistic degrees as weights. As a platform-agnostic framework, CQELS 2.0 can be implemented for devices with different hardware architectures (from embedded devices to cloud infrastructures). Moreover, this version also includes an adaptive federator that allows CQELS instances on different nodes in a network to coordinate their resources to distribute processing pipelines by delegating partial workloads to their peers via subscribing continuous queries

ROJan 27, 2022
SemRob: Towards Semantic Stream Reasoning for Robotic Operating Systems

Manh Nguyen-Duc, Anh Le-Tuan, Manfred Hauswirth et al.

Stream processing and reasoning is getting considerable attention in various application domains such as IoT, Industry IoT and Smart Cities. In parallel, reasoning and knowledge-based features have attracted research into many areas of robotics, such as robotic mapping, perception and interaction. To this end, the Semantic Stream Reasoning (SSR) framework can unify the representations of symbolic/semantic streams with deep neural networks, to integrate high-dimensional data streams, such as video streams and LiDAR point clouds, with traditional graph or relational stream data. As such, this positioning and system paper will outline our approach to build a platform to facilitate semantic stream reasoning capabilities on a robotic operating system called SemRob.

HCFeb 21, 2014
Analysing Parallel and Passive Web Browsing Behavior and its Effects on Website Metrics

Christian von der Weth, Manfred Hauswirth

Getting deeper insights into the online browsing behavior of Web users has been a major research topic since the advent of the WWW. It provides useful information to optimize website design, Web browser design, search engines offerings, and online advertisement. We argue that new technologies and new services continue to have significant effects on the way how people browse the Web. For example, listening to music clips on YouTube or to a radio station on Last.fm does not require users to sit in front of their computer. Social media and networking sites like Facebook or micro-blogging sites like Twitter have attracted new types of users that previously were less inclined to go online. These changes in how people browse the Web feature new characteristics which are not well understood so far. In this paper, we provide novel and unique insights by presenting first results of DOBBS, our long-term effort to create a comprehensive and representative dataset capturing online user behavior. We firstly investigate the concepts of parallel browsing and passive browsing, showing that browsing the Web is no longer a dedicated task for many users. Based on these results, we then analyze their impact on the calculation of a user's dwell time -- i.e., the time the user spends on a webpage -- which has become an important metric to quantify the popularity of websites.

IRJul 5, 2013
Finding Information Through Integrated Ad-Hoc Socializing in the Virtual and Physical World

Christian von der Weth, Manfred Hauswirth

Despite the services of sophisticated search engines like Google, there are a number of interesting information sources which are useful but largely inaccessible to current Web users. These information sources are often ad-hoc, location-specific and only useful for users over short periods of time, or relate to tacit knowledge of users or implicit knowledge in crowds. The solution presented in this paper addresses these problems by introducing an integrated concept of "location" and "presence" across the physical and virtual worlds enabling ad-hoc socializing of users interested in, or looking for similar information. While the definition of presence in the physical world is straightforward - through a spatial location and vicinity at a certain point in time - their definitions in the virtual world are neither obvious nor trivial. Based on a detailed analysis we provide an integrated spatial model spanning both worlds which enables us to define presence of users in a unified way. This integrated model allows us to enable ad-hoc socializing of users browsing the Web with users in the physical world specific to their joint information needs and allows us to unlock the untapped information sources mentioned above. We describe a proof-of-concept implementation of our model and provide an empirical analysis based on real-world experiments.

HCJul 5, 2013
DOBBS: Towards a Comprehensive Dataset to Study the Browsing Behavior of Online Users

Christian von der Weth, Manfred Hauswirth

The investigation of the browsing behavior of users provides useful information to optimize web site design, web browser design, search engines offerings, and online advertisement. This has been a topic of active research since the Web started and a large body of work exists. However, new online services as well as advances in Web and mobile technologies clearly changed the meaning behind "browsing the Web" and require a fresh look at the problem and research, specifically in respect to whether the used models are still appropriate. Platforms such as YouTube, Netflix or last.fm have started to replace the traditional media channels (cinema, television, radio) and media distribution formats (CD, DVD, Blu-ray). Social networks (e.g., Facebook) and platforms for browser games attracted whole new, particularly less tech-savvy audiences. Furthermore, advances in mobile technologies and devices made browsing "on-the-move" the norm and changed the user behavior as in the mobile case browsing is often being influenced by the user's location and context in the physical world. Commonly used datasets, such as web server access logs or search engines transaction logs, are inherently not capable of capturing the browsing behavior of users in all these facets. DOBBS (DERI Online Behavior Study) is an effort to create such a dataset in a non-intrusive, completely anonymous and privacy-preserving way. To this end, DOBBS provides a browser add-on that users can install, which keeps track of their browsing behavior (e.g., how much time they spent on the Web, how long they stay on a website, how often they visit a website, how they use their browser, etc.). In this paper, we outline the motivation behind DOBBS, describe the add-on and captured data in detail, and present some first results to highlight the strengths of DOBBS.