DLJun 2
A Double Bind: Gendered Funding, Research Topics, and Academic Performance in The Social SciencesYang Ding, Ning Zhang, Helen Bao et al.
While female representation in social sciences is increasing, systemic gender disparities may persist in research funding and academic performance. Some argue that female scholars now receive equal opportunities, yet evidence suggests that gender imbalances remain, particularly in specific research areas. This study examines 12,945 National Science Foundation (NSF)-funded principal investigators in social sciences from 2000 to 2019 to assess gender disparities in grant allocation, research topics, and post-award academic performance. Findings reveal a dual imbalance. First, despite similar overall funding success rates, female scholars remain underrepresented in high-impact and traditionally male-dominated research topics. Males dominate most funded topics, especially STEM-related ones, while female-led topics align with traditional gender stereotypes. Second, post-award performance patterns suggest that females outperform males in male-dominated fields, whereas males excel in female-dominated ones, undermining any presumed advantage of female scholars in their own research areas. These disparities contribute to the risk of both genders prematurely exiting the science pipeline. Furthermore, early-career experiences shape these outcomes asymmetrically: postdoctoral experience benefits both genders in female-dominated fields, with stronger effects for males, but disadvantages females in male-dominated fields by reducing their output and citation impact. Longer postdoctoral tenure enhances male researchers' citation impact across all fields but has mixed effects for females depending on field gender composition. These findings underscore the need for policies that address not just overall funding equality, but also gendered disparities across research topics and career trajectories.
DLApr 10, 2025
Dynamic disruption index across citation and cited references windows: Recommendations for thresholds in research evaluationHongkan Chen, Lutz Bornmann, Yi Bu
The temporal dimension of citation accumulation poses fundamental challenges for quantitative research evaluations, particularly in assessing disruptive and consolidating research through the disruption index (D). While prior studies emphasize minimum citation windows (mostly 3-5 years) for reliable citation impact measurements, the time-sensitive nature of D - which quantifies a paper' s capacity to eclipse prior knowledge - remains underexplored. This study addresses two critical gaps: (1) determining the temporal thresholds required for publications to meet citation/reference prerequisites, and (2) identifying "optimal" citation windows that balance early predictability and longitudinal validity. By analyzing millions of publications across four fields with varying citation dynamics, we employ some metrics to track D stabilization patterns. Key findings reveal that a 10-year window achieves >80% agreement with final D classifications, while shorter windows (3 years) exhibit instability. Publications with >=30 references stabilize 1-3 years faster, and extreme cases (top/bottom 5% D values) become identifiable within 5 years - enabling early detection of 60-80% of highly disruptive and consolidating works. The findings offer significant implications for scholarly evaluation and science policy, emphasizing the need for careful consideration of citation window length in research assessment (based on D).
AIJul 11, 2023
Has China caught up to the US in AI research? An exploration of mimetic isomorphism as a model for late industrializersChao Min, Yi Zhao, Yi Bu et al.
Artificial Intelligence (AI), a cornerstone of 21st-century technology, has seen remarkable growth in China. In this paper, we examine China's AI development process, demonstrating that it is characterized by rapid learning and differentiation, surpassing the export-oriented growth propelled by Foreign Direct Investment seen in earlier Asian industrializers. Our data indicates that China currently leads the USA in the volume of AI-related research papers. However, when we delve into the quality of these papers based on specific metrics, the USA retains a slight edge. Nevertheless, the pace and scale of China's AI development remain noteworthy. We attribute China's accelerated AI progress to several factors, including global trends favoring open access to algorithms and research papers, contributions from China's broad diaspora and returnees, and relatively lax data protection policies. In the vein of our research, we have developed a novel measure for gauging China's imitation of US research. Our analysis shows that by 2018, the time lag between China and the USA in addressing AI research topics had evaporated. This finding suggests that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory. While this study compares China and the USA exclusively, it's important to note that research collaborations between these two nations have resulted in more highly cited work than those produced by either country independently. This underscores the power of international cooperation in driving scientific progress in AI.
DLApr 19
Academic match-makers in sociology: Their role in collaboration network formationHongkan Chen, Qingshan Zhou, Robin Haunschild et al.
In modern scientific collaboration networks, certain researchers play a pivotal role in bridging scholars who have never worked together - a phenomenon we term academic "match-makers." Despite their potential importance, the prevalence, characteristics, benefits, and long-term trajectory of these individuals remain underexplored. Using the Microsoft Academic Graph (MAG), we operationalized a match-maker as an author who, in a given publication, introduced a first-time collaboration between two co-authors, each of whom had previously collaborated with the match-maker but not with each other. We employed a configuration null model to distinguish observed patterns from random chance. Our findings reveal that the match-maker phenomenon is deliberate, prevalent, and consequential. Among authors with over 20 publications, nearly 30% have served as a match-maker, and the probability of acting as one increased eightfold from 1980 to 2019. Publications involving a match-maker are more likely to appear in high-impact journals and exhibit higher disruptiveness - particularly in larger teams - suggesting that match-makers help facilitate what we term integrative disruption. Match-makers tend to emerge early in their careers, peaking around the 20th publication and at an academic age of roughly ten years. While nearly all match-makers eventually experience "abandonment" in the sense that the connected researchers later collaborate without them, their continued involvement remains substantial and is driven by research needs rather than structural factors. This reframes abandonment not as exclusion but as a natural evolution within project-based collaborations. The academic match-maker phenomenon is a strategic feature of collaboration networks characterized by early-career emergence, context-dependent persistence, and tangible contributions to high-impact, disruptive research.
CLSep 8, 2022
Exploring the Distribution Regularities of User Attention and Sentiment toward Product Aspects in Online ReviewsChenglei Qin, Chengzhi Zhang, Yi Bu
[Purpose] To better understand the online reviews and help potential consumers, businessmen, and product manufacturers effectively obtain users' evaluation on product aspects, this paper explores the distribution regularities of user attention and sentiment toward product aspects from the temporal perspective of online reviews. [Design/methodology/approach] Temporal characteristics of online reviews (purchase time, review time, and time intervals between purchase time and review time), similar attributes clustering, and attribute-level sentiment computing technologies are employed based on more than 340k smartphone reviews of three products from JD.COM (a famous online shopping platform in China) to explore the distribution regularities of user attention and sentiment toward product aspects in this article. [Findings] The empirical results show that a power-law distribution can fit user attention to product aspects, and the reviews posted in short time intervals contain more product aspects. Besides, the results show that the values of user sentiment of product aspects are significantly higher/lower in short time intervals which contribute to judging the advantages and weaknesses of a product. [Research limitations] The paper can't acquire online reviews for more products with temporal characteristics to verify the findings because of the restriction on reviews crawling by the shopping platforms. [Originality/value] This work reveals the distribution regularities of user attention and sentiment toward product aspects, which is of great significance in assisting decision-making, optimizing review presentation, and improving the shopping experience.
CLApr 19
RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented GenerationJuhyeon Lee, Wonduk Seo, Junseo Koh et al.
Detecting harmful content in multi turn dialogue requires reasoning over the full conversational context rather than isolated utterances. However, most existing methods rely mainly on models internal parametric knowledge, without explicit grounding in external normative principles. This often leads to inconsistent judgments in socially nuanced contexts, limited interpretability, and redundant reasoning across turns. To address this, we propose RoTRAG, a retrieval augmented framework that incorporates concise human written moral norms, called Rules of Thumb (RoTs), into LLM based harm assessment. For each turn, RoTRAG retrieves relevant RoTs from an external corpus and uses them as explicit normative evidence for turn level reasoning and final severity classification. To improve efficiency, we further introduce a lightweight binary routing classifier that decides whether a new turn requires retrieval grounded reasoning or can reuse existing context. Experiments on ProsocialDialog and Safety Reasoning Multi Turn Dialogue show that RoTRAG consistently improves both harm classification and severity estimation over competitive baselines, with an average relative gain of around 40% in F1 across benchmark datasets and an average relative reduction of 8.4% in distributional error, while reducing redundant computation without sacrificing performance.
DLApr 16
A Semantic Geometry for Uncovering Paradigm Dynamics via Scientific PublicationsJinchang Liu, Qingshan Zhou, Hongkan Chen et al.
Science advances not only by accumulating discovered patterns but by changing how new problems and solutions are expressed. While structural indicators track scholarly attention, they offer only an indirect proxy for the reorganization of meaning. We propose a semantic geometry based on the R-P-C (references, focal publication, and citing publications) framework to quantify how a publication positions itself relative to its knowledge base and diffusion. This geometry identifies three publication types: consolidating, exploratory and balanced. Our results show that the semantic similarity and distance between a publication's knowledge base and diffusion serve as a mechanistic explanation for disruption, with novelty (atypical reference combinations) acting as an antecedent disturbance that triggers a semantic rupture. This is related to team size, where small teams preserve a higher potential for exploratory departures while large collaborations systematically align with paradigmatic consolidation. Crucially, this geometry explains why citation trajectories differ; consolidating research earns rapid recognition by lowering comprehension costs, while exploratory work faces high paradigm conversion costs that result in slower, more selective diffusion. Collectively, this R-P-C framework provides a robust instrument for monitoring the dynamic of scientific paradigms.
CLSep 2, 2025Code
Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt OptimizationJuhyeon Lee, Wonduk Seo, Hyunjin An et al.
Automatic prompt optimization has recently emerged as a strategy for improving the quality of prompts used in Large Language Models (LLMs), with the goal of generating more accurate and useful responses. However, most prior work focuses on direct prompt refinement or model fine-tuning, overlooking the potential of leveraging LLMs' inherent reasoning capability to learn from contrasting examples. In this paper, we present Contrastive Reasoning Prompt Optimization (CRPO), a novel framework that formulates prompt optimization as a retrieval-augmented reasoning process. Our approach retrieves top k reference prompt-response pairs from the HelpSteer2 dataset, an open source collection where each response is annotated for helpfulness, correctness, coherence, complexity, and verbosity, and constructs two complementary optimization paradigms: (1) tiered contrastive reasoning, where the LLM compares high-, medium-, and low-quality exemplars (both prompts and responses) to refine its own generation through reflective reasoning, and (2) multi-metric contrastive reasoning, where the LLM analyzes the best exemplars along each evaluation dimension and integrates their strengths into an optimized prompt. By explicitly contrasting high and low quality exemplars, CRPO enables the model to deduce why certain prompts succeed while others fail, thereby achieving more robust and interpretable optimization. Experimental results on the HelpSteer2 benchmark demonstrate that CRPO significantly outperforms baselines. Our findings highlight the promise of contrastive, retrieval-augmented reasoning for advancing automatic prompt optimization.
CYApr 24
Smaller, Younger, and More Impactful: How AI-Assisted Writing Transforms Research TeamsHaoyang Wang, Mingze Zhang, Yi Bu et al.
The era of Big Science has long been defined by increasingly large and specialized research teams pushing the frontiers of knowledge. However, recent advances in artificial intelligence (AI), particularly large language models (LLMs), are beginning to reshape academic writing and scientific research, potentially disrupting the longstanding trend toward ever-larger teams and transforming other dimensions of research team structure. Drawing on 147,074 full-text publications from the PLoS family and the Nature portfolio since 2020, we examined whether and how AI-assisted writing influences team structure and team outcomes in science. Using multiple methods, including ordinary least square, quantile regression, Poisson regression, logistic regression and propensity score matching, we found that research teams using AI-assisted writing tend to be younger and smaller. Importantly, this shift toward more compact, junior-leaning teams does not come at the expense of scientific impact. On the contrary, we observed a higher probability of research teams that employed AI-assisted writing producing highly impactful publications. These results highlight the significant role of AI-assisted writing in reshaping not only how research is produced, but also how research teams are formed and assembled. Our findings call for policy improvements in research evaluation, funding, and training to address this emerging trend.
CLJan 29
Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent ReasoningWonduk Seo, Wonseok Choi, Junseo Koh et al.
Large Language Models (LLMs) increasingly support culturally sensitive decision making, yet often exhibit misalignment due to skewed pretraining data and the absence of structured value representations. Existing methods can steer outputs, but often lack demographic grounding and treat values as independent, unstructured signals, reducing consistency and interpretability. We propose OG-MAR, an Ontology-Guided Multi-Agent Reasoning framework. OG-MAR summarizes respondent-specific values from the World Values Survey (WVS) and constructs a global cultural ontology by eliciting relations over a fixed taxonomy via competency questions. At inference time, it retrieves ontology-consistent relations and demographically similar profiles to instantiate multiple value-persona agents, whose outputs are synthesized by a judgment agent that enforces ontology consistency and demographic proximity. Experiments on regional social-survey benchmarks across four LLM backbones show that OG-MAR improves cultural alignment and robustness over competitive baselines, while producing more transparent reasoning traces.
AIDec 7, 2025
Academic journals' AI policies fail to curb the surge in AI-assisted academic writingYongyuan He, Yi Bu
The rapid integration of generative AI into academic writing has prompted widespread policy responses from journals and publishers. However, the effectiveness of these policies remains unclear. Here, we analyze 5,114 journals and over 5.2 million papers to evaluate the real-world impact of AI usage guidelines. We show that despite 70% of journals adopting AI policies (primarily requiring disclosure), researchers' use of AI writing tools has increased dramatically across disciplines, with no significant difference between journals with or without policies. Non-English-speaking countries, physical sciences, and high-OA journals exhibit the highest growth rates. Crucially, full-text analysis on 164k scientific publications reveals a striking transparency gap: Of the 75k papers published since 2023, only 76 (0.1%) explicitly disclosed AI use. Our findings suggest that current policies have largely failed to promote transparency or restrain AI adoption. We urge a re-evaluation of ethical frameworks to foster responsible AI integration in science.
AIMar 30, 2025
SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data ScienceWonduk Seo, Juhyeon Lee, Yi Bu
Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.
CLJan 2, 2025
ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual LearningWonduk Seo, Zonghao Yuan, Yi Bu
Ensuring cultural values alignment in Large Language Models (LLMs) remains a critical challenge, as these models often embed Western-centric biases from their training data, leading to misrepresentations and fairness concerns in cross-cultural applications. Existing approaches such as role assignment and few-shot learning struggle to address these limitations effectively due to their reliance on pre-trained knowledge, limited scalability, and inability to capture nuanced cultural values. To address these issues, we propose ValuesRAG, a novel and effective framework that applies Retrieval-Augmented Generation (RAG) with In-Context Learning (ICL) to integrate cultural and demographic knowledge dynamically during text generation. Leveraging the World Values Survey (WVS) dataset, ValuesRAG first generates summaries of values for each individual. We subsequently curate several representative regional datasets to serve as test datasets and retrieve relevant summaries of values based on demographic features, followed by a reranking step to select the top-k relevant summaries. We evaluate ValuesRAG using 6 diverse regional datasets and show that it consistently outperforms baselines: including zero-shot, role-assignment, few-shot, and hybrid methods, both in main experiments and ablation settings. Notably, ValuesRAG achieves the best overall performance over prior methods, demonstrating its effectiveness in fostering culturally aligned and inclusive AI systems. Our findings underscore the potential of dynamic retrieval-based methods to bridge the gap between global LLM capabilities and localized cultural values.
MAOct 18, 2025
Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent AnalysisWonduk Seo, Juhyeon Lee, Junseo Koh et al.
Prompt optimization has emerged as an effective alternative to retraining for improving the performance of Large Language Models (LLMs). However, most existing approaches treat evaluation as a black box, relying solely on numerical scores while offering limited insight into why a prompt succeeds or fails. They also depend heavily on trial-and-error refinements, which are difficult to interpret and control. In this paper, we introduce MA-SAPO, a Multi-Agent framework for Score-Aware Prompt Optimization. Compared to prior methods, MA-SAPO explicitly couples evaluation outcomes with structured reasoning to guide systematic edits. The framework specifically consists of two stages: during the Reasoning Phase, agents collaboratively explain metric scores, diagnose weaknesses, and synthesize targeted refinements that are stored as reusable reasoning assets; during the Test Phase, agents retrieve these assets to analyze optimized prompts and apply only evidence-grounded edits. By turning evaluation signals into interpretable reasoning chains, MA-SAPO produces prompt refinements that are more transparent, auditable, and controllable. Experiments on the HelpSteer1/2 benchmarks demonstrate consistent improvements over single-pass prompting, retrieval-augmented baselines, and prior multi-agent strategies, validating the effectiveness of our approach.
DLFeb 17, 2022
The Gene of Scientific SuccessXiangjie Kong, Jun Zhang, Da Zhang et al.
This paper elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, and discovering potential cooperators etc. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard working. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our paper presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other.
SOC-PHAug 9, 2021
Team Power Dynamics and Team Impact: New Perspectives on Scientific Collaboration using Career Age as a Proxy for Team PowerHuimin Xu, Yi Bu, Meijun Liu et al.
Power dynamics influence every aspect of scientific collaboration. Team power dynamics can be measured by team power level and team power hierarchy. Team power level is conceptualized as the average level of the possession of resources, expertise, or decision-making authorities of a team. Team power hierarchy represents the vertical differences of the possessions of resources in a team. In Science of Science, few studies have looked at scientific collaboration from the perspective of team power dynamics. This research examines how team power dynamics affect team impact to fill the research gap. In this research, all co-authors of one publication are treated as one team. Team power level and team power hierarchy of one team are measured by the mean and Gini index of career age of co-authors in this team. Team impact is quantified by citations of a paper authored by this team. By analyzing over 7.7 million teams from Science (e.g., Computer Science, Physics), Social Sciences (e.g., Sociology, Library & Information Science), and Arts & Humanities (e.g., Art), we find that flat team structure is associated with higher team impact, especially when teams have high team power level. These findings have been repeated in all five disciplines except Art, and are consistent in various types of teams from Computer Science including teams from industry or academia, teams with different gender groups, teams with geographical contrast, and teams with distinct size.
AIJul 4, 2020
Coronavirus Knowledge Graph: A Case StudyChongyan Chen, Islam Akef Ebeid, Yi Bu et al.
The emergence of the novel COVID-19 pandemic has had a significant impact on global healthcare and the economy over the past few months. The virus's rapid widespread has led to a proliferation in biomedical research addressing the pandemic and its related topics. One of the essential Knowledge Discovery tools that could help the biomedical research community understand and eventually find a cure for COVID-19 are Knowledge Graphs. The CORD-19 dataset is a collection of publicly available full-text research articles that have been recently published on COVID-19 and coronavirus topics. Here, we use several Machine Learning, Deep Learning, and Knowledge Graph construction and mining techniques to formalize and extract insights from the PubMed dataset and the CORD-19 dataset to identify COVID-19 related experts and bio-entities. Besides, we suggest possible techniques to predict related diseases, drug candidates, gene, gene mutations, and related compounds as part of a systematic effort to apply Knowledge Discovery methods to help biomedical researchers tackle the pandemic.
CLJul 27, 2019
Analyzing Linguistic Complexity and Scientific ImpactChao Lu, Yi Bu, Xianlei Dong et al.
The number of publications and the number of citations received have become the most common indicators of scholarly success. In this context, scientific writing increasingly plays an important role in scholars' scientific careers. To understand the relationship between scientific writing and scientific impact, this paper selected 12 variables of linguistic complexity as a proxy for depicting scientific writing. We then analyzed these features from 36,400 full-text Biology articles and 1,797 full-text Psychology articles. These features were compared to the scientific impact of articles, grouped into high, medium, and low categories. The results suggested no practical significant relationship between linguistic complexity and citation strata in either discipline. This suggests that textual complexity plays little role in scientific impact in our data sets.
CLJul 22, 2018
Examining Scientific Writing Styles from the Perspective of Linguistic ComplexityChao Lu, Yi Bu, Jie Wang et al.
Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.