9.9STJun 2
Unbiased estimation of squared concentration in the Fisher-von Mises-Langevin distribution and the impossibility of unbiased concentrationZain Jabbar, Yuqin Jiang, Andrey A. Popov
The estimation of concentration parameter in Fisher-von Mises-Langevin distribution is the directional statistics analogue of the estimation of the precision matrix for the Gaussian distribution. In this work we show that unbiased estimation of this parameter is impossible. With this realization in hand, we provide an alternative parameterization of the Fisher-von Mises-Langevin distribution in terms of the squared concentration, which we term the intensity. We fruther show that unbiased estimation of thereof is possible, and provide (almost) unbiased estimators thereof in terms of a partial sum U-statistic. We showcase our new estimator on synthetic data, New York taxi trip data, and on spherical word embeddings.
LGOct 27, 2025
AI based signage classification for linguistic landscape studiesYuqin Jiang, Song Jiang, Jacob Algrim et al.
Linguistic Landscape (LL) research traditionally relies on manual photography and annotation of public signages to examine distribution of languages in urban space. While such methods yield valuable findings, the process is time-consuming and difficult for large study areas. This study explores the use of AI powered language detection method to automate LL analysis. Using Honolulu Chinatown as a case study, we constructed a georeferenced photo dataset of 1,449 images collected by researchers and applied AI for optical character recognition (OCR) and language classification. We also conducted manual validations for accuracy checking. This model achieved an overall accuracy of 79%. Five recurring types of mislabeling were identified, including distortion, reflection, degraded surface, graffiti, and hallucination. The analysis also reveals that the AI model treats all regions of an image equally, detecting peripheral or background texts that human interpreters typically ignore. Despite these limitations, the results demonstrate the potential of integrating AI-assisted workflows into LL research to reduce such time-consuming processes. However, due to all the limitations and mis-labels, we recognize that AI cannot be fully trusted during this process. This paper encourages a hybrid approach combining AI automation with human validation for a more reliable and efficient workflow.
LGSep 25, 2025
Downscaling human mobility data based on demographic socioeconomic and commuting characteristics using interpretable machine learning methodsYuqin Jiang, Andrey A. Popov, Tianle Duan et al.
Understanding urban human mobility patterns at various spatial levels is essential for social science. This study presents a machine learning framework to downscale origin-destination (OD) taxi trips flows in New York City from a larger spatial unit to a smaller spatial unit. First, correlations between OD trips and demographic, socioeconomic, and commuting characteristics are developed using four models: Linear Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Neural Networks (NN). Second, a perturbation-based sensitivity analysis is applied to interpret variable importance for nonlinear models. The results show that the linear regression model failed to capture the complex variable interactions. While NN performs best with the training and testing datasets, SVM shows the best generalization ability in downscaling performance. The methodology presented in this study provides both analytical advancement and practical applications to improve transportation services and urban development.