SEMay 15Code
From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMsJunaid Baber, Nicolas Hili, Didier Schwab et al.
Large Language Models (LLMs) have shown increasing potential in automating model-driven software engineering tasks, particularly in generating models conforming to Domain Specific Languages (DSLs) from natural language. While most existing approaches rely on large proprietary models, their high cost and limited deployability hinder broader adoption. In this paper, we evaluate whether open-source LLMs of varying sizes (0.5B to 32B parameters) can generate DSL-conformant models using only few-shot prompting, without any fine-tuning. Our evaluation focuses on key model-driven engineering (MDE) requirements, including syntactic validity, semantic completeness, and inter-model reference consistency. We extend our prior work by moving from generating user interface models (referred to as "UI models" in this paper) over fixed, predefined data schemas ("data models") to generating both the UI and data models entirely from scratch. This shift serves two purposes: first, it highlights the LLM's ability to infer domain-specific relationships and maintain consistency across multiple interconnected models; second, it allows us to generalize earlier findings by testing DSL generation across models of different natures and structural roles. Our structured evaluation combines automatic parsing and expert feedback across 39 LLMs, revealing that several compact models (e.g., \texttt{gemma3:12b}, \texttt{mistral:7b-instruct}) approach or match the quality of much larger models. These findings demonstrate the feasibility of using smaller, open-source LLMs for grammar-conformant DSL generation in MDE workflows, offering a cost-effective and deployable alternative to closed LLMs.
CVSep 16, 2021
Compact Binary Fingerprint for Image Copy Re-RankingNazar Mohammad, Junaid Baber, Maheen Bakhtyar et al.
Image copy detection is challenging and appealing topic in computer vision and signal processing. Recent advancements in multimedia have made distribution of image across the global easy and fast: that leads to many other issues such as forgery and image copy retrieval. Local keypoint descriptors such as SIFT are used to represent the images, and based on those descriptors matching, images are matched and retrieved. Features are quantized so that searching/matching may be made feasible for large databases at the cost of accuracy loss. In this paper, we propose binary feature that is obtained by quantizing the SIFT into binary, and rank list is re-examined to remove the false positives. Experiments on challenging dataset shows the gain in accuracy and time.
CLJul 6, 2021
Identifying negativity factors from social media text corpus using sentiment analysis methodMohammad Aimal, Maheen Bakhtyar, Junaid Baber et al.
Automatic sentiment analysis play vital role in decision making. Many organizations spend a lot of budget to understand their customer satisfaction by manually going over their feedback/comments or tweets. Automatic sentiment analysis can give overall picture of the comments received against any event, product, or activity. Usually, the comments/tweets are classified into two main classes that are negative or positive. However, the negative comments are too abstract to understand the basic reason or the context. organizations are interested to identify the exact reason for the negativity. In this research study, we hierarchically goes down into negative comments, and link them with more classes. Tweets are extracted from social media sites such as Twitter and Facebook. If the sentiment analysis classifies any tweet into negative class, then we further try to associates that negative comments with more possible negative classes. Based on expert opinions, the negative comments/tweets are further classified into 8 classes. Different machine learning algorithms are evaluated and their accuracy are reported.
IRJun 12, 2021
BIOPAK Flasher: Epidemic disease monitoring and detection in Pakistan using text miningMuhammad Nasir, Maheen Bakhtyar, Junaid Baber et al.
Infectious disease outbreak has a significant impact on morbidity, mortality and can cause economic instability of many countries. As global trade is growing, goods and individuals are expected to travel across the border, an infected epidemic area carrier can pose a great danger to his hostile. If a disease outbreak is recognized promptly, then commercial products and travelers (traders/visitors) will be effectively vaccinated, and therefore the disease stopped. Early detection of outbreaks plays an important role here, and beware of the rapid implementation of control measures by citizens, public health organizations, and government. Many indicators have valuable information, such as online news sources (RSS) and social media sources (Twitter, Facebook) that can be used, but are unstructured and bulky, to extract information about disease outbreaks. Few early warning outbreak systems exist with some limitation of linguistic (Urdu) and covering areas (Pakistan). In Pakistan, few channels are published the outbreak news in Urdu or English. The aim is to procure information from Pakistan's English and Urdu news channels and then investigate process, integrate, and visualize the disease epidemic. Urdu ontology is not existed before to match extracted diseases, so we also build that ontology of disease.