Daniel Oliveira

CRFeb 6, 2021Code

uTango: an open-source TEE for IoT devices

Daniel Oliveira, Tiago Gomes, Sandro Pinto

Security is one of the main challenges of the Internet of Things (IoT). IoT devices are mainly powered by low-cost microcontrollers (MCUs) that typically lack basic hardware security mechanisms to separate security-critical applications from less critical components. Recently, Arm has started to release Cortex-M MCUs enhanced with TrustZone technology (i.e., TrustZone-M), a system-wide security solution aiming at providing robust protection for IoT devices. Trusted Execution Environments (TEEs) relying on TrustZone hardware have been perceived as safe havens for securing mobile devices. However, for the past few years, considerable effort has gone into unveiling hundreds of vulnerabilities and proposing a collection of relevant defense techniques to address several issues. While new TEE solutions built on TrustZone-M start flourishing, the lessons gathered from the research community appear to be falling short, as these new systems are trapping into the same pitfalls of the past. In this paper, we present uTango, the first multi-world TEE for modern IoT devices. uTango proposes a novel architecture aiming at tackling the major architectural deficiencies currently affecting TrustZone(-M)-assisted TEEs. In particular, we leverage the very same TrustZone hardware primitives used by dual-world implementations to create multiple and equally secure execution environments within the normal world. We demonstrate the benefits of uTango by conducting an extensive evaluation on a real TrustZone-M hardware platform, i.e., Arm Musca-B1. uTango will be open-sourced and freely available on GitHub in hopes of engaging academia and industry on securing the foreseeable trillion IoT devices.

CVFeb 25

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

Daniel Oliveira, David Martins de Matos

Visual storytelling models that correctly ground entities in images may still hallucinate semantic relationships, generating incorrect dialogue attribution, character interactions, or emotional states. We introduce StoryMovie, a dataset of 1,757 stories aligned with movie scripts and subtitles through LCS matching. Our alignment pipeline synchronizes screenplay dialogue with subtitle timestamps, enabling dialogue attribution by linking character names from scripts to temporal positions from subtitles. Using this aligned content, we generate stories that maintain visual grounding tags while incorporating authentic character names, dialogue, and relationship dynamics. We fine-tune Qwen Storyteller3 on this dataset, building on prior work in visual grounding and entity re-identification. Evaluation using DeepSeek V3 as judge shows that Storyteller3 achieves an 89.9% win rate against base Qwen2.5-VL 7B on subtitle alignment. Compared to Storyteller, trained without script grounding, Storyteller3 achieves 48.5% versus 38.0%, confirming that semantic alignment progressively improves dialogue attribution beyond visual grounding alone.

Daniel Oliveira

2 Papers