LGJul 3, 2022
Mental Illness Classification on Social Media Texts using Deep Learning and Transfer LearningIqra Ameer, Muhammad Arif, Grigori Sidorov et al.
Given the current social distance restrictions across the world, most individuals now use social media as their major medium of communication. Millions of people suffering from mental diseases have been isolated due to this, and they are unable to get help in person. They have become more reliant on online venues to express themselves and seek advice on dealing with their mental disorders. According to the World health organization (WHO), approximately 450 million people are affected. Mental illnesses, such as depression, anxiety, etc., are immensely common and have affected an individuals' physical health. Recently Artificial Intelligence (AI) methods have been presented to help mental health providers, including psychiatrists and psychologists, in decision making based on patients' authentic information (e.g., medical records, behavioral data, social media utilization, etc.). AI innovations have demonstrated predominant execution in numerous real-world applications broadening from computer vision to healthcare. This study analyzes unstructured user data on the Reddit platform and classifies five common mental illnesses: depression, anxiety, bipolar disorder, ADHD, and PTSD. We trained traditional machine learning, deep learning, and transfer learning multi-class models to detect mental disorders of individuals. This effort will benefit the public health system by automating the detection process and informing appropriate authorities about people who require emergency assistance.
AIMay 22
EPPC-OASIS: Ontology-Aware Adaptation and Structured Inference Refinement for Electronic Patient-Provider Communication Mining in Secure MessagesSamah Fodeh, Sreeraj Ramachandran, Elyas Irankhah et al.
Secure patient-provider messages contain clinically important communication behaviors that are difficult to characterize manually at scale. The Electronic Patient-Provider Communication (EPPC) framework provides an ontology for coding these behaviors, but automated extraction remains challenging because predictions must preserve fine-grained code/sub-code structure while grounding annotations in message text. We developed EPPC-OASIS, an ontology-aware adaptation approach for structured EPPC extraction, and combined it with deployable inference-refinement procedures designed to improve the coherence of final annotations. EPPC-OASIS augments supervised fine-tuning with a Wasserstein alignment objective that encourages alignment between model representation neighborhoods and EPPC ontology-derived neighborhoods, while inference refinement uses verification, self-consistency, hybrid correction, and selection or ensembling to address residual prediction errors. We evaluated the framework on a de-identified corpus of secure patient-provider messages against prompting, supervised fine-tuning, preference-based, and robustness-oriented baselines across multiple open-weight language models. Across model families, the best deployable pipeline achieved 77.13% Code+Sub-code F1 and 63.83% Triplet F1, corresponding to modest but consistent absolute gains of +1.39 and +2.12 F1 points over the strongest supervised fine-tuning baseline. These results suggest that ontology-aware adaptation with structured inference refinement can support scalable retrospective EPPC mining, although external validation is needed before operational use.
LGJun 25, 2025Code
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and RegressionAinaz Jamshidi, Dongchan Kim, Muhammad Arif
Predictive maintenance (PdM) has become a crucial element of modern industrial practice. PdM plays a significant role in operational dependability and cost management by decreasing unforeseen downtime and optimizing asset life cycle management. Machine learning and deep learning have enabled more precise forecasts of equipment failure and remaining useful life (RUL). Although many studies have been conducted on PdM, there has not yet been a standalone comparative study between regression- and classification-based approaches. In this review, we look across a range of PdM methodologies, while focusing more strongly on the comparative use of classification and regression methods in prognostics. While regression-based methods typically provide estimates of RUL, classification-based methods present a forecast of the probability of failure across defined time intervals. Through a comprehensive analysis of recent literature, we highlight key advancements, challenges-such as data imbalance and high-dimensional feature spaces-and emerging trends, including hybrid approaches and AI-enabled prognostic systems. This review aims to provide researchers and practitioners with an awareness of the strengths and compromises of various PdM methods and to help identify future research and build more robust, directed adaptive maintenance systems. Future work may include a systematic review of practical aspects such as public datasets, benchmarking platforms, and open-source tools to support the advancement of PdM research.
CVFeb 21, 2017Code
Scene Recognition by Combining Local and Global Image DescriptorsJobin Wilson, Muhammad Arif
Object recognition is an important problem in computer vision, having diverse applications. In this work, we construct an end-to-end scene recognition pipeline consisting of feature extraction, encoding, pooling and classification. Our approach simultaneously utilize global feature descriptors as well as local feature descriptors from images, to form a hybrid feature descriptor corresponding to each image. We utilize DAISY features associated with key points within images as our local feature descriptor and histogram of oriented gradients (HOG) corresponding to an entire image as a global descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch K-Means algorithm to reduce the complexity of our feature encoding scheme. A 2-level pooling procedure is used to combine DAISY and HOG features corresponding to each image. Finally, we experiment with a multi-class SVM classifier with several kernels, in a cross-validation setting, and tabulate our results on the fifteen scene categories dataset. The average accuracy of our model was 76.4% in the case of a 40%-60% random split of images into training and testing datasets respectively. The primary objective of this work is to clearly outline the practical implementation of a basic screne-recognition pipeline having a reasonable accuracy, in python, using open-source libraries. A full implementation of the proposed model is available in our github repository.
LGDec 17, 2024
Synthetic Time Series Data Generation for Healthcare Applications: A PCG Case StudyAinaz Jamshidi, Muhammad Arif, Sabir Ali Kalhoro et al.
The generation of high-quality medical time series data is essential for advancing healthcare diagnostics and safeguarding patient privacy. Specifically, synthesizing realistic phonocardiogram (PCG) signals offers significant potential as a cost-effective and efficient tool for cardiac disease pre-screening. Despite its potential, the synthesis of PCG signals for this specific application received limited attention in research. In this study, we employ and compare three state-of-the-art generative models from different categories - WaveNet, DoppelGANger, and DiffWave - to generate high-quality PCG data. We use data from the George B. Moody PhysioNet Challenge 2022. Our methods are evaluated using various metrics widely used in the previous literature in the domain of time series data generation, such as mean absolute error and maximum mean discrepancy. Our results demonstrate that the generated PCG data closely resembles the original datasets, indicating the effectiveness of our generative models in producing realistic synthetic PCG data. In our future work, we plan to incorporate this method into a data augmentation pipeline to synthesize abnormal PCG signals with heart murmurs, in order to address the current scarcity of abnormal data. We hope to improve the robustness and accuracy of diagnostic tools in cardiology, enhancing their effectiveness in detecting heart murmurs.
CLApr 25, 2025
EDU-NER-2025: Named Entity Recognition in Urdu Educational Texts using XLM-RoBERTa with X (formerly Twitter)Fida Ullah, Muhammad Ahmad, Muhammad Tayyab Zamir et al.
Named Entity Recognition (NER) plays a pivotal role in various Natural Language Processing (NLP) tasks by identifying and classifying named entities (NEs) from unstructured data into predefined categories such as person, organization, location, date, and time. While extensive research exists for high-resource languages and general domains, NER in Urdu particularly within domain-specific contexts like education remains significantly underexplored. This is Due to lack of annotated datasets for educational content which limits the ability of existing models to accurately identify entities such as academic roles, course names, and institutional terms, underscoring the urgent need for targeted resources in this domain. To the best of our knowledge, no dataset exists in the domain of the Urdu language for this purpose. To achieve this objective this study makes three key contributions. Firstly, we created a manually annotated dataset in the education domain, named EDU-NER-2025, which contains 13 unique most important entities related to education domain. Second, we describe our annotation process and guidelines in detail and discuss the challenges of labelling EDU-NER-2025 dataset. Third, we addressed and analyzed key linguistic challenges, such as morphological complexity and ambiguity, which are prevalent in formal Urdu texts.
NIMay 9, 2019
Open-source RANs in practice: an over-the-air deployment for 5G MECJuuso Haavisto, Muhammad Arif, Lauri Lovén et al.
Edge computing that leverages cloud resources to the proximity of user devices is seen as the future infrastructure for distributed applications. However, developing and deploying edge applications, that rely on cellular networks, is burdensome. Such network infrastructures are often based on proprietary components, each with unique programming abstractions and interfaces. To facilitate straightforward deployment of edge applications, we introduce OSS based RAN on OTA commercial spectrum with DevOps capabilities. OSS allows software modifications and integrations of the system components, e.g., EPC and edge hosts running applications, required for new data pipelines and optimizations not addressed in standardization. Such an OSS infrastructure enables further research and prototyping of novel end-user applications in an environment familiar to software engineers without telecommunications background. We evaluated the presented infrastructure with E2E OTA testing, resulting in 7.5MB/s throughput and latency of 21ms, which shows that the presented infrastructure provides low latency for edge applications.
AIOct 10, 2016
Ranking academic institutions on potential paper acceptance in upcoming conferencesJobin Wilson, Ram Mohan, Muhammad Arif et al.
The crux of the problem in KDD Cup 2016 involves developing data mining techniques to rank research institutions based on publications. Rank importance of research institutions are derived from predictions on the number of full research papers that would potentially get accepted in upcoming top-tier conferences, utilizing public information on the web. This paper describes our solution to KDD Cup 2016. We used a two step approach in which we first identify full research papers corresponding to each conference of interest and then train two variants of exponential smoothing models to make predictions. Our solution achieves an overall score of 0.7508, while the winning submission scored 0.7656 in the overall results.