TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning
This addresses a domain-specific problem for engineers in design and verification, but it is incremental as it builds on existing multimodal models.
The paper tackles the problem of assisting engineers in understanding complex timing diagrams by introducing TD-Interpreter, a visual question-answer tool that uses a fine-tuned multimodal model, and it outperforms untuned GPT-4o by a large margin on benchmarks.
We introduce TD-Interpreter, a specialized ML tool that assists engineers in understanding complex timing diagrams (TDs), originating from a third party, during their design and verification process. TD-Interpreter is a visual question-answer environment which allows engineers to input a set of TDs and ask design and verification queries regarding these TDs. We implemented TD-Interpreter with multimodal learning by fine-tuning LLaVA, a lightweight 7B Multimodal Large Language Model (MLLM). To address limited training data availability, we developed a synthetic data generation workflow that aligns visual information with its textual interpretation. Our experimental evaluation demonstrates the usefulness of TD-Interpreter which outperformed untuned GPT-4o by a large margin on the evaluated benchmarks.