CLJun 10, 2023
A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language TextJessica López Espejel, Mahaman Sanoussi Yahaya Alassan, El Mehdi Chouham et al.
Java Code Generation consists in generating automatically Java code from a Natural Language Text. This NLP task helps in increasing programmers' productivity by providing them with immediate solutions to the simplest and most repetitive tasks. Code generation is a challenging task because of the hard syntactic rules and the necessity of a deep understanding of the semantic aspect of the programming language. Many works tried to tackle this task using either RNN-based, or Transformer-based models. The latter achieved remarkable advancement in the domain and they can be divided into three groups: (1) encoder-only models, (2) decoder-only models, and (3) encoder-decoder models. In this paper, we provide a comprehensive review of the evolution and progress of deep learning models in Java code generation task. We focus on the most important methods and present their merits and limitations, as well as the objective functions used by the community. In addition, we provide a detailed description of datasets and evaluation metrics used in the literature. Finally, we discuss results of different models on CONCODE dataset, then propose some future directions.
CLAug 3, 2024Code
Sólo Escúchame: Spanish Emotional Accompaniment ChatbotBruno Gil Ramírez, Jessica López Espejel, María del Carmen Santiago Díaz et al.
According to the World Health Organization (WHO), suicide was the fourth leading cause of death in the world for individuals aged 15 to 29 in 2019. Given the rapid increase in mental health issues, providing psychological support is both crucial and urgent. In this paper: (1) we propose Sólo Escúchame, the first open-source Spanish emotional assistance chatbot, based on LLaMA-2-7b-Chat. (2) We introduced the HEAR (Hispanic Emotional Accompaniment Responses) dataset, compiled from multiple English sources translated into Spanish, as well as generic data generated using ChatGPT-3.5-Turbo. Finally, (3) we propose an evaluation metric based on two semi-automatic assessment methods. Our system outperforms a range of state-of-the-art models in providing psychological assistance in Spanish. Our models and datasets are publicly available to facilitate reproducibility.
CLMar 22, 2023
JaCoText: A Pretrained Model for Java Code-Text GenerationJessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane et al.
Pretrained transformer-based models have shown high performance in natural language generation task. However, a new wave of interest has surged: automatic programming language generation. This task consists of translating natural language instructions to a programming code. Despite the fact that well-known pretrained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformers neural network. It aims to generate java source code from natural language text. JaCoText leverages advantages of both natural language and code generation models. More specifically, we study some findings from the state of the art and use them to (1) initialize our model from powerful pretrained models, (2) explore additional pretraining on our java dataset, (3) carry out experiments combining the unimodal and bimodal data in the training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.
CLJul 10, 2023
Entity Identifier: A Natural Text Parsing-based Framework For Entity Relation ExtractionEl Mehdi Chouham, Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan et al.
The field of programming has a diversity of paradigms that are used according to the working framework. While current neural code generation methods are able to learn and generate code directly from text, we believe that this approach is not optimal for certain code tasks, particularly the generation of classes in an object-oriented project. Specifically, we use natural language processing techniques to extract structured information from requirements descriptions, in order to automate the generation of CRUD (Create, Read, Update, Delete) class code. To facilitate this process, we introduce a pipeline for extracting entity and relation information, as well as a representation called an "Entity Tree" to model this information. We also create a dataset to evaluate the effectiveness of our approach.
CLJun 19, 2024Code
Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial ApplicationsMahaman Sanoussi Yahaya Alassan, Jessica López Espejel, Merieme Bouhandi et al.
Large Language Models (LLMs) have recently demonstrated remarkable performance in various Natural Language Processing (NLP) applications, such as sentiment analysis, content generation, and personalized recommendations. Despite their impressive capabilities, there remains a significant need for systematic studies concerning the practical application of LLMs in industrial settings, as well as the specific requirements and challenges related to their deployment in these contexts. This need is particularly critical for Machine Reading Comprehension (MCR), where factual, concise, and accurate responses are required. To date, most MCR rely on Small Language Models (SLMs) or Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM). This trend is evident in the SQuAD2.0 rankings on the Papers with Code table. This article presents a comparative analysis between open-source LLMs and proprietary models on this task, aiming to identify light and open-source alternatives that offer comparable performance to proprietary models.
CLOct 7, 2021Code
GeSERA: General-domain Summary Evaluation by Relevance AnalysisJessica López Espejel, Gaël de Chalendar, Jorge Garcia Flores et al.
We present GeSERA, an open-source improved version of SERA for evaluating automatic extractive and abstractive summaries from the general domain. SERA is based on a search engine that compares candidate and reference summaries (called queries) against an information retrieval document base (called index). SERA was originally designed for the biomedical domain only, where it showed a better correlation with manual methods than the widely used lexical-based ROUGE method. In this paper, we take out SERA from the biomedical domain to the general one by adapting its content-based method to successfully evaluate summaries from the general domain. First, we improve the query reformulation strategy with POS Tags analysis of general-domain corpora. Second, we replace the biomedical index used in SERA with two article collections from AQUAINT-2 and Wikipedia. We conduct experiments with TAC2008, TAC2009, and CNNDM datasets. Results show that, in most cases, GeSERA achieves higher correlations with manual evaluation methods than SERA, while it reduces its gap with ROUGE for general-domain summary evaluation. GeSERA even surpasses ROUGE in two cases of TAC2009. Finally, we conduct extensive experiments and provide a comprehensive study of the impact of human annotators and the index size on summary evaluation with SERA and GeSERA.
AIDec 4, 2024
From Words to Workflows: Automating Business ProcessesLaura Minkova, Jessica López Espejel, Taki Eddine Toufik Djaidja et al.
As businesses increasingly rely on automation to streamline operations, the limitations of Robotic Process Automation (RPA) have become apparent, particularly its dependence on expert knowledge and inability to handle complex decision-making tasks. Recent advancements in Artificial Intelligence (AI), particularly Generative AI (GenAI) and Large Language Models (LLMs), have paved the way for Intelligent Automation (IA), which integrates cognitive capabilities to overcome the shortcomings of RPA. This paper introduces Text2Workflow, a novel method that automatically generates workflows from natural language user requests. Unlike traditional automation approaches, Text2Workflow offers a generalized solution for automating any business process, translating user inputs into a sequence of executable steps represented in JavaScript Object Notation (JSON) format. Leveraging the decision-making and instruction-following capabilities of LLMs, this method provides a scalable, adaptable framework that enables users to visualize and execute workflows with minimal manual intervention. This research outlines the Text2Workflow methodology and its broader implications for automating complex business processes.
AIApr 17, 2024
Low-Cost Language Models: Survey and Performance Evaluation on Python Code GenerationJessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Merieme Bouhandi et al.
Large Language Models (LLMs) have become a popular choice for many Natural Language Processing (NLP) tasks due to their versatility and ability to produce high-quality results. Specifically, they are increasingly used for automatic code generation to help developers tackle repetitive coding tasks. However, LLMs' substantial computational and memory requirements often make them inaccessible to users with limited resources. This paper focuses on very low-cost models which offer a more accessible alternative to resource-intensive LLMs. We notably: (1) propose a thorough semi-manual evaluation of their performance in generating Python code, (2) introduce a Chain-of-Thought (CoT) prompting strategy to improve model reasoning and code quality, and (3) propose a new dataset of 60 programming problems, with varied difficulty levels, designed to extend existing benchmarks like HumanEval and EvalPlus. Our findings show that some low-cost compatible models achieve competitive results compared to larger models like ChatGPT despite using significantly fewer resources. We will make our dataset and prompts publicly available to support further research.
HCMay 5, 2024
Visual Grounding Methods for Efficient Interaction with Desktop Graphical User InterfacesEl Hassane Ettifouri, Jessica López Espejel, Laura Minkova et al.
Most visual grounding solutions primarily focus on realistic images. However, applications involving synthetic images, such as Graphical User Interfaces (GUIs), remain limited. This restricts the development of autonomous computer vision-powered artificial intelligence (AI) agents for automatic application interaction. Enabling AI to effectively understand and interact with GUIs is crucial to advancing automation in software testing, accessibility, and human-computer interaction. In this work, we explore Instruction Visual Grounding (IVG), a multi-modal approach to object identification within a GUI. More precisely, given a natural language instruction and a GUI screen, IVG locates the coordinates of the element on the screen where the instruction should be executed. We propose two main methods: (1) IVGocr, which combines a Large Language Model (LLM), an object detection model, and an Optical Character Recognition (OCR) module; and (2) IVGdirect, which uses a multimodal architecture for end-to-end grounding. For each method, we introduce a dedicated dataset. In addition, we propose the Central Point Validation (CPV) metric, a relaxed variant of the classical Central Proximity Score (CPS) metric. Our final test dataset is publicly released to support future research.
CLMay 21, 2023
GPT-3.5, GPT-4, or BARD? Evaluating LLMs Reasoning Ability in Zero-Shot Setting and Performance Boosting Through PromptsJessica López Espejel, El Hassane Ettifouri, Mahaman Sanoussi Yahaya Alassan et al.
Large Language Models (LLMs) have exhibited remarkable performance on various Natural Language Processing (NLP) tasks. However, there is a current hot debate regarding their reasoning capacity. In this paper, we examine the performance of GPT-3.5, GPT-4, and BARD models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets. Our paper provides empirical evidence showcasing the superior performance of ChatGPT-4 in comparison to both ChatGPT-3.5 and BARD in zero-shot setting throughout almost all evaluated tasks. While the superiority of GPT-4 compared to GPT-3.5 might be explained by its larger size and NLP efficiency, this was not evident for BARD. We also demonstrate that the three models show limited proficiency in Inductive, Mathematical, and Multi-hop Reasoning Tasks. To bolster our findings, we present a detailed and comprehensive analysis of the results from these three models. Furthermore, we propose a set of engineered prompts that enhances the zero-shot setting performance of all three models.