AIMay 11, 2022
Detecting Emerging Technologies and their Evolution using Deep Learning and Weak Signal AnalysisAshkan Ebadi, Alain Auger, Yvan Gauthier
Emerging technologies can have major economic impacts and affect strategic stability. Yet, early identification of emerging technologies remains challenging. In order to identify emerging technologies in a timely and reliable manner, a comprehensive examination of relevant scientific and technological (S&T) trends and their related references is required. This examination is generally done by domain experts and requires significant amounts of time and effort to gain insights. The use of domain experts to identify emerging technologies from S&T trends may limit the capacity to analyse large volumes of information and introduce subjectivity in the assessments. Decision support systems are required to provide accurate and reliable evidence-based indicators through constant and continuous monitoring of the environment and help identify signals of emerging technologies that could alter security and economic prosperity. For example, the research field of hypersonics has recently witnessed several advancements having profound technological, commercial, and national security implications. In this work, we present a multi-layer quantitative approach able to identify future signs from scientific publications on hypersonics by leveraging deep learning and weak signal analysis. The proposed framework can help strategic planners and domain experts better identify and monitor emerging technology trends.
CLAug 17, 2022
On the evolution of research in hypersonics: application of natural language processing and machine learningAshkan Ebadi, Alain Auger, Yvan Gauthier
Research and development in hypersonics have progressed significantly in recent years, with various military and commercial applications being demonstrated increasingly. Public and private organizations in several countries have been investing in hypersonics, with the aim to overtake their competitors and secure/improve strategic advantage and deterrence. For these organizations, being able to identify emerging technologies in a timely and reliable manner is paramount. Recent advances in information technology have made it possible to analyze large amounts of data, extract hidden patterns, and provide decision-makers with new insights. In this study, we focus on scientific publications about hypersonics within the period of 2000-2020, and employ natural language processing and machine learning to characterize the research landscape by identifying 12 key latent research themes and analyzing their temporal evolution. Our publication similarity analysis revealed patterns that are indicative of cycles during two decades of research. The study offers a comprehensive analysis of the research field and the fact that the research themes are algorithmically extracted removes subjectivity from the exercise and enables consistent comparisons between topics and between time intervals.
CLAug 4, 2025
Test Set Quality in Multilingual LLM EvaluationChalamalasetti Kranti, Gabriel Bernier-Colborne, Yvan Gauthier et al.
Several multilingual benchmark datasets have been developed in a semi-automatic manner in the recent past to measure progress and understand the state-of-the-art in the multilingual capabilities of Large Language Models. However, there is not a lot of attention paid to the quality of the datasets themselves, despite the existence of previous work in identifying errors in even fully human-annotated test sets. In this paper, we manually analyze recent multilingual evaluation sets in two languages - French and Telugu, identifying several errors in the process. We compare the performance difference across several LLMs with the original and revised versions of the datasets and identify large differences (almost 10% in some cases) in both languages). Based on these results, we argue that test sets should not be considered immutable and should be revisited, checked for correctness, and potentially versioned. We end with some recommendations for both the dataset creators as well as consumers on addressing the dataset quality issues.