CV LGJun 16, 2025

Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos

Dipayan Biswas, Shishir Shah, Jaspal Subhlok

arXiv:2506.13657v26.22 citationsh-index: 24Has Code

Originality Synthesis-oriented

AI Analysis

This provides a new dataset for researchers working on visual content detection in educational videos, but it is incremental as it focuses on a specific domain.

The authors introduced the Lecture Video Visual Objects (LVVO) dataset, a benchmark for visual object detection in educational videos, consisting of 4,000 frames with manual annotations for 1,000 frames achieving an inter-annotator F1 score of 83.41%.

We introduce the Lecture Video Visual Objects (LVVO) dataset, a new benchmark for visual object detection in educational video content. The dataset consists of 4,000 frames extracted from 245 lecture videos spanning biology, computer science, and geosciences. A subset of 1,000 frames, referred to as LVVO_1k, has been manually annotated with bounding boxes for four visual categories: Table, Chart-Graph, Photographic-image, and Visual-illustration. Each frame was labeled independently by two annotators, resulting in an inter-annotator F1 score of 83.41%, indicating strong agreement. To ensure high-quality consensus annotations, a third expert reviewed and resolved all cases of disagreement through a conflict resolution process. To expand the dataset, a semi-supervised approach was employed to automatically annotate the remaining 3,000 frames, forming LVVO_3k. The complete dataset offers a valuable resource for developing and evaluating both supervised and semi-supervised methods for visual content detection in educational videos. The LVVO dataset is publicly available to support further research in this domain.

View on arXiv PDF Code

Similar