Pedro H. L. Leite

1paper

1 Paper

48.8ASMay 28
Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

Pedro H. L. Leite, Pedro Benevenuto Valadares, Luiz W. P. Biscainho

Regional accent classification in Brazilian Portuguese (pt-BR) suffers from the need for reliable labeling. While large self-supervised learning (SSL) speech models are powerful, their training pipelines dilute sociophonetic information, since accent labels are generally not reliable or are not used in training objectives. This work introduces a novel workflow for feature extraction using only acoustic labels. By isolating explicit regional accent landmarks and using a phoneme-based forced aligner (ZIPA), our targeted feature set captures dialectal variance more effectively than utterance embeddings, demonstrating that localized features can outperform general-purpose architectures on accent-related tasks using minimal and objective data labels.