CVAISep 12, 2025

An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars

arXiv:2509.09911v1h-index: 31Sci Rep
Originality Incremental advance
AI Analysis

It addresses the need for more interpretable and accurate models in high-stakes forensic applications like dental age estimation, though it is incremental as it builds on existing methods.

This study tackled the problem of limited transparency in deep learning for dental age estimation by developing a framework combining a convolutional autoencoder and Vision Transformer, which improved classification accuracy for second molars from 0.712 to 0.815 and for third molars from 0.462 to 0.543.

The practical adoption of deep learning in high-stakes forensic applications, such as dental age estimation, is often limited by the 'black box' nature of the models. This study introduces a framework designed to enhance both performance and transparency in this context. We use a notable performance disparity in the automated staging of mandibular second (tooth 37) and third (tooth 38) molars as a case study. The proposed framework, which combines a convolutional autoencoder (AE) with a Vision Transformer (ViT), improves classification accuracy for both teeth over a baseline ViT, increasing from 0.712 to 0.815 for tooth 37 and from 0.462 to 0.543 for tooth 38. Beyond improving performance, the framework provides multi-faceted diagnostic insights. Analysis of the AE's latent space metrics and image reconstructions indicates that the remaining performance gap is data-centric, suggesting high intra-class morphological variability in the tooth 38 dataset is a primary limiting factor. This work highlights the insufficiency of relying on a single mode of interpretability, such as attention maps, which can appear anatomically plausible yet fail to identify underlying data issues. By offering a methodology that both enhances accuracy and provides evidence for why a model may be uncertain, this framework serves as a more robust tool to support expert decision-making in forensic age estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes