A modular architecture for creating multimodal agents
This work addresses the need for scalable and analyzable multimodal agent systems, though it appears incremental as it builds on existing modular and event-driven approaches.
The paper tackles the problem of building multimodal interactive agents by proposing a modular architecture with an event-bus system, enabling flexible integration of sensors and interpretation components, and it results in a platform that records interactions as multimodal data and an episodic Knowledge Graph for analysis and comparison of agents.
The paper describes a flexible and modular platform to create multimodal interactive agents. The platform operates through an event-bus on which signals and interpretations are posted in a sequence in time. Different sensors and interpretation components can be integrated by defining their input and output as topics, which results in a logical workflow for further interpretations. We explain a broad range of components that have been developed so far and integrated into a range of interactive agents. We also explain how the actual interaction is recorded as multimodal data as well as in a so-called episodic Knowledge Graph. By analysing the recorded interaction, we can analyse and compare different agents and agent components.