Shingo Yokoi

h-index3
2papers

2 Papers

CVOct 18, 2022Code
5th Place Solution to Kaggle Google Universal Image Embedding Competition

Noriaki Ota, Shingo Yokoi, Shinsuke Yamaoka

In this paper, we present our solution, which placed 5th in the kaggle Google Universal Image Embedding Competition in 2022. We use the ViT-H visual encoder of CLIP from the openclip repository as a backbone and train a head model composed of BatchNormalization and Linear layers using ArcFace. The dataset used was a subset of products10K, GLDv2, GPR1200, and Food101. And applying TTA for part of images also improves the score. With this method, we achieve a score of 0.684 on the public and 0.688 on the private leaderboard. Our code is available. https://github.com/riron1206/kaggle-Google-Universal-Image-Embedding-Competition-5th-Place-Solution

CVOct 14, 2025Code
Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos

Shingo Yokoi, Kento Sasaki, Yu Yamaguchi

Recent advances in end-to-end (E2E) autonomous driving have been enabled by training on diverse large-scale driving datasets, yet autonomous driving models still struggle in out-of-distribution (OOD) scenarios. The COOOL benchmark targets this gap by encouraging hazard understanding beyond closed taxonomies, and the 2COOOL challenge extends it to generating human-interpretable incident reports. We present a hierarchical reasoning framework for incident report generation from dashcam videos that integrates frame-level captioning, incident frame detection, and fine-grained reasoning within vision-language models (VLMs). We further improve factual accuracy and readability through model ensembling and a Blind A/B Scoring selection protocol. On the official 2COOOL open leaderboard, our method ranks 2nd among 29 teams and achieves the best CIDEr-D score, producing accurate and coherent incident narratives. These results indicate that hierarchical reasoning with VLMs is a promising direction for accident analysis and for broader understanding of safety-critical traffic events. The implementation and code are available at https://github.com/riron1206/kaggle-2COOOL-2nd-Place-Solution.