LGAICLSISep 14, 2025

Agentic Username Suggestion and Multimodal Gender Detection in Online Platforms: Introducing the PNGT-26K Dataset

arXiv:2509.11136v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of digital identity creation for Persian-speaking users in online platforms, but it is incremental as it builds on existing methods with new data and domain-specific applications.

The research tackled the challenge of gender detection and username suggestion for Persian names by introducing the PNGT-26K dataset with 26,000 tuples, and developed two frameworks: Open Gender Detection for probabilistic gender guessing and Nominalist for agentic username suggestions.

Persian names present unique challenges for natural language processing applications, particularly in gender detection and digital identity creation, due to transliteration inconsistencies and cultural-specific naming patterns. Existing tools exhibit significant performance degradation on Persian names, while the scarcity of comprehensive datasets further compounds these limitations. To address these challenges, the present research introduces PNGT-26K, a comprehensive dataset of Persian names, their commonly associated gender, and their English transliteration, consisting of approximately 26,000 tuples. As a demonstration of how this resource can be utilized, we also introduce two frameworks, namely Open Gender Detection and Nominalist. Open Gender Detection is a production-grade, ready-to-use framework for using existing data from a user, such as profile photo and name, to give a probabilistic guess about the person's gender. Nominalist, the second framework introduced by this paper, utilizes agentic AI to help users choose a username for their social media accounts on any platform. It can be easily integrated into any website to provide a better user experience. The PNGT-26K dataset, Nominalist and Open Gender Detection frameworks are publicly available on Github.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes