Suman Kalyan Maity

CL
h-index15
11papers
1,157citations
Novelty35%
AI Score46

11 Papers

CLJul 30, 2025Code
PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs

Homaira Huda Shomee, Suman Kalyan Maity, Sourav Medya

Large language models (LLMs) have emerged as transformative approaches in several important fields. This paper aims for a paradigm shift for patent writing by leveraging LLMs to overcome the tedious patent-filing process. In this work, we present PATENTWRITER, the first unified benchmarking framework for evaluating LLMs in patent abstract generation. Given the first claim of a patent, we evaluate six leading LLMs -- including GPT-4 and LLaMA-3 -- under a consistent setup spanning zero-shot, few-shot, and chain-of-thought prompting strategies to generate the abstract of the patent. Our benchmark PATENTWRITER goes beyond surface-level evaluation: we systematically assess the output quality using a comprehensive suite of metrics -- standard NLP measures (e.g., BLEU, ROUGE, BERTScore), robustness under three types of input perturbations, and applicability in two downstream patent classification and retrieval tasks. We also conduct stylistic analysis to assess length, readability, and tone. Experimental results show that modern LLMs can generate high-fidelity and stylistically appropriate patent abstracts, often surpassing domain-specific baselines. Our code and dataset are open-sourced to support reproducibility and future research.

SIMar 27
ParsCN: A Persian Dataset for Counter-Narrative Generation to Combat Online Hate Speech

Zahra Safdari Fesaghandis, Suman Kalyan Maity

Online hate speech threatens online civility, particularly in low-resource and multilingual environments. Counter-narratives offer a promising solution by promoting constructive responses to hate speech. However, automatic counter-narrative generation is hindered by the lack of high-quality data for low-resource languages like Persian. To bridge this gap, we introduce ParsCN, the first and most comprehensive Persian counter-narrative dataset. Consisting of 1,100 hate speech and counter-narrative pairs, it provides fine-grained annotations across six target groups and six countering strategies, tailored to the socio-cultural context of Persian online discourse. We propose a novel, scalable multi-stage framework that integrates culturally-informed human annotation with few-shot LLM-augmented generation, guided by semantic retrieval and rigorous manual curation. This approach enables the creation of diverse, high-quality counter-narratives while significantly reducing annotation costs - establishing a replicable paradigm for other low-resource settings. Comprehensive human and automatic evaluations confirm the quality of the dataset and the effectiveness of the generated responses. Human-written counter-narratives achieved the highest scores for relevance (4.23), Effectiveness (4.21), fluency (4.92), and tone appropriateness (4.79), with GPT-4o and Claude closely following. Automatic evaluations show strong semantic alignment, high lexical diversity, and low toxicity across all sources. Finally, we conduct benchmark evaluations using mBART and PersianMind on a held-out test set. Results reveal that existing models struggle with fluency, cultural nuance, and safety - highlighting the need for Persian-specific resources like ParsCN. Our dataset serves as a foundational benchmark to advance research on Persian counter-narrative generation and foster safer, more inclusive digital spaces.

CLMar 1
Multilingual Hate Speech Detection and Counterspeech Generation: A Comprehensive Survey and Practical Guide

Zahra Safdari Fesaghandis, Suman Kalyan Maity

Combating online hate speech in multilingual settings requires approaches that go beyond English-centric models and capture the cultural and linguistic diversity of global online discourse. This paper presents a comprehensive survey and practical guide to multilingual hate speech detection and counterspeech generation, integrating recent advances in natural language processing. We analyze why monolingual systems often fail in non-English and code-mixed contexts, missing implicit hate and culturally specific expressions. To address these challenges, we outline a structured three-phase framework - task design, data curation, and evaluation - drawing on state-of-the-art datasets, models, and metrics. The survey consolidates progress in multilingual resources and techniques while highlighting persistent obstacles, including data scarcity in low-resource languages, fairness and bias in system development, and the need for multimodal solutions. By bridging technical progress with ethical and cultural considerations, we provide researchers, practitioners, and policymakers with scalable guidelines for building context-aware, inclusive systems. Our roadmap contributes to advancing online safety through fairer, more effective detection and counterspeech generation across diverse linguistic environments.

SISep 10, 2019
Competing Topic Naming Conventions in Quora: Predicting Appropriate Topic Merges and Winning Topics from Millions of Topic Pairs

Binny Mathew, Suman Kalyan Maity, Pawan Goyal et al.

Quora is a popular Q&A site which provides users with the ability to tag questions with multiple relevant topics which helps to attract quality answers. These topics are not predefined but user-defined conventions and it is not so rare to have multiple such conventions present in the Quora ecosystem describing exactly the same concept. In almost all such cases, users (or Quora moderators) manually merge the topic pair into one of the either topics, thus selecting one of the competing conventions. An important application for the site therefore is to identify such competing conventions early enough that should merge in future. In this paper, we propose a two-step approach that uniquely combines the anomaly detection and the supervised classification frameworks to predict whether two topics from among millions of topic pairs are indeed competing conventions, and should merge, achieving an F-score of 0.711. We also develop a model to predict the direction of the topic merge, i.e., the winning convention, achieving an F-score of 0.898. Our system is also able to predict ~ 25% of the correct case of merges within the first month of the merge and ~ 40% of the cases within a year. This is an encouraging result since Quora users on average take 936 days to identify such a correct merge. Human judgment experiments show that our system is able to predict almost all the correct cases that humans can predict plus 37.24% correct cases which the humans are not able to identify at all.

SIMar 10, 2019
DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow

Suman Kalyan Maity, Abhishek Panigrahi, Sayan Ghosh et al.

In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow. The proposed system learns the content representation from question title and body. Subsequently, the learnt representation from heterogeneous relationship between user and tags is fused with the content representation for the final tag prediction. On a very large-scale dataset comprising half a million question posts, DeepTagRec beats all the baselines; in particular, it significantly outperforms the best performing baseline T agCombine achieving an overall gain of 60.8% and 36.8% in precision@3 and recall@10 respectively. DeepTagRec also achieves 63% and 33.14% maximum improvement in exact-k accuracy and top-k accuracy respectively over TagCombine

SINov 17, 2018
Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions

Binny Mathew, Ritam Dutt, Suman Kalyan Maity et al.

Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social Q&A site Quora. The choice of Quora is motivated by the fact that this is one of the rare social Q&A sites that allow users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of 5.1 million questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we "deep dive" and (topically) cluster the questions and compare the clusters that have high volumes of anonymous questions with those that have low volumes of anonymous questions. In particular, we observe that the choice to post the question as anonymous is dependent on the user's perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. We further perform personality trait analysis and observe that the anonymous group of users has positive correlation with extraversion, agreeableness, and negative correlation with openness. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support and user engagement.

SIFeb 14, 2018
Understanding Book Popularity on Goodreads

Suman Kalyan Maity, Ayush Kumar, Ankan Mullick et al.

Goodreads has launched the Readers Choice Awards since 2009 where users are able to nominate/vote books of their choice, released in the given year. In this work, we question if the number of votes that a book would receive (aka the popularity of the book) can be predicted based on the characteristics of various entities on Goodreads. We are successful in predicting the popularity of the books with high prediction accuracy (correlation coefficient ~0.61) and low RMSE (~1.25). User engagement and author's prestige are found to be crucial factors for book popularity.

CLFeb 1, 2018
Adapting predominant and novel sense discovery algorithms for identifying corpus-specific sense differences

Binny Mathew, Suman Kalyan Maity, Pratip Sarkar et al.

Word senses are not static and may have temporal, spatial or corpus-specific scopes. Identifying such scopes might benefit the existing WSD systems largely. In this paper, while studying corpus specific word senses, we adapt three existing predominant and novel-sense discovery algorithms to identify these corpus-specific senses. We make use of text data available in the form of millions of digitized books and newspaper archives as two different sources of corpora and propose automated methods to identify corpus-specific word senses at various time points. We conduct an extensive and thorough human judgment experiment to rigorously evaluate and compare the performance of these approaches. Post adaptation, the output of the three algorithms are in the same format and the accuracy results are also comparable, with roughly 45-60% of the reported corpus-specific senses being judged as genuine.

LGApr 11, 2017
ENWalk: Learning Network Features for Spam Detection in Twitter

K C Santosh, Suman Kalyan Maity, Arjun Mukherjee

Social medias are increasing their influence with the vast public information leading to their active use for marketing by the companies and organizations. Such marketing promotions are difficult to identify unlike the traditional medias like TV and newspaper. So, it is very much important to identify the promoters in the social media. Although, there are active ongoing researches, existing approaches are far from solving the problem. To identify such imposters, it is very much important to understand their strategies of social circle creation and dynamics of content posting. Are there any specific spammer types? How successful are each types? We analyze these questions in the light of social relationships in Twitter. Our analyses discover two types of spammers and their relationships with the dynamics of content posts. Our results discover novel dynamics of spamming which are intuitive and arguable. We propose ENWalk, a framework to detect the spammers by learning the feature representations of the users in the social media. We learn the feature representations using the random walks biased on the spam dynamics. Experimental results on large-scale twitter network and the corresponding tweets show the effectiveness of our approach that outperforms the existing approaches

CLMar 11, 2017
Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora

Suman Kalyan Maity, Aman Kharb, Animesh Mukherjee

Quora is one of the most popular community Q&A sites of recent times. However, many question posts on this Q&A site often do not get answered. In this paper, we quantify various linguistic activities that discriminates an answered question from an unanswered one. Our central finding is that the way users use language while writing the question text can be a very effective means to characterize answerability. This characterization helps us to predict early if a question remaining unanswered for a specific time period t will eventually be answered or not and achieve an accuracy of 76.26% (t = 1 month) and 68.33% (t = 3 months). Notably, features representing the language use patterns of the users are most discriminative and alone account for an accuracy of 74.18%. We also compare our method with some of the similar works (Dror et al., Yang et al.) achieving a maximum improvement of ~39% in terms of accuracy.

CLJan 31, 2016
WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

Suman Kalyan Maity, Chaitanya Sarda, Anshit Chaudhary et al.

Language in social media is mostly driven by new words and spellings that are constantly entering the lexicon thereby polluting it and resulting in high deviation from the formal written version. The primary entities of such language are the out-of-vocabulary (OOV) words. In this paper, we study various sociolinguistic properties of the OOV words and propose a classification model to categorize them into at least six categories. We achieve 81.26% accuracy with high precision and recall. We observe that the content features are the most discriminative ones followed by lexical and context features.