Hindi Question Generation Using Dependency Structures
This addresses a data scarcity problem for Hindi NLP applications, but it is incremental as it applies existing methods to a specific language.
The paper tackles the lack of data for Hindi question answering by developing a rule-based system for automatic question generation, using dependency structures and semantic filters to produce diverse questions, with results showing significantly more questions generated than input sentences.
Hindi question answering systems suffer from a lack of data. To address the same, this paper presents an approach towards automatic question generation. We present a rule-based system for question generation in Hindi by formalizing question transformation methods based on karaka-dependency theory. We use a Hindi dependency parser to mark the karaka roles and use IndoWordNet a Hindi ontology to detect the semantic category of the karaka role heads to generate the interrogatives. We analyze how one sentence can have multiple generations from the same karaka role's rule. The generations are manually annotated by multiple annotators on a semantic and syntactic scale for evaluation. Further, we constrain our generation with the help of various semantic and syntactic filters so as to improve the generation quality. Using these methods, we are able to generate diverse questions, significantly more than number of sentences fed to the system.