Mücahit Çevik

h-index18

28papers

295citations

Novelty38%

AI Score35

Ranked #108,724 of 194,257 authors (top 56%)#23,917 in LG (top 60%)

28 Papers

7.4SEApr 12, 2022Code

S-DABT: Schedule and Dependency-Aware Bug Triage in Open-Source Bug Tracking Systems

Hadi Jahanshahi, Mucahit Cevik

Fixing bugs in a timely manner lowers various potential costs in software maintenance. However, manual bug fixing scheduling can be time-consuming, cumbersome, and error-prone. In this paper, we propose the Schedule and Dependency-aware Bug Triage (S-DABT), a bug triaging method that utilizes integer programming and machine learning techniques to assign bugs to suitable developers. Unlike prior works that largely focus on a single component of the bug reports, our approach takes into account the textual data, bug fixing costs, and bug dependencies. We further incorporate the schedule of developers in our formulation to have a more comprehensive model for this multifaceted problem. As a result, this complete formulation considers developers' schedules and the blocking effects of the bugs while covering the most significant aspects of the previously proposed methods. Our numerical study on four open-source software systems, namely, EclipseJDT, LibreOffice, GCC, and Mozilla, shows that taking into account the schedules of the developers decreases the average bug fixing times. We find that S-DABT leads to a high level of developer utilization through a fair distribution of the tasks among the developers and efficient use of the free spots in their schedules. Via the simulation of the issue tracking system, we also show how incorporating the schedule in the model formulation reduces the bug fixing time, improves the assignment accuracy, and utilizes the capability of each developer without much comprising in the model run times. We find that S-DABT decreases the complexity of the bug dependency graph by prioritizing blocking bugs and effectively reduces the infeasible assignment ratio due to bug dependencies. Consequently, we recommend considering developers' schedules while automating bug triage.

5.5SEJan 9, 2023Code

Transfer learning for conflict and duplicate detection in software requirement pairs

Garima Malik, Savas Yildirim, Mucahit Cevik et al.

Consistent and holistic expression of software requirements is important for the success of software projects. In this study, we aim to enhance the efficiency of the software development processes by automatically identifying conflicting and duplicate software requirement specifications. We formulate the conflict and duplicate detection problem as a requirement pair classification task. We design a novel transformers-based architecture, SR-BERT, which incorporates Sentence-BERT and Bi-encoders for the conflict and duplicate identification task. Furthermore, we apply supervised multi-stage fine-tuning to the pre-trained transformer models. We test the performance of different transfer models using four different datasets. We find that sequentially trained and fine-tuned transformer models perform well across the datasets with SR-BERT achieving the best performance for larger datasets. We also explore the cross-domain performance of conflict detection models and adopt a rule-based filtering approach to validate the model classifications. Our analysis indicates that the sentence pair classification approach and the proposed transformer-based natural language processing strategies can contribute significantly to achieving automation in conflict and duplicate detection

4.3STMar 23, 2023

Explaining Exchange Rate Forecasts with Macroeconomic Fundamentals Using Interpretive Machine Learning

Davood Pirayesh Neghab, Mucahit Cevik, M. I. M. Wahab

The complexity and ambiguity of financial and economic systems, along with frequent changes in the economic environment, have made it difficult to make precise predictions that are supported by theory-consistent explanations. Interpreting the prediction models used for forecasting important macroeconomic indicators is highly valuable for understanding relations among different factors, increasing trust towards the prediction models, and making predictions more actionable. In this study, we develop a fundamental-based model for the Canadian-U.S. dollar exchange rate within an interpretative framework. We propose a comprehensive approach using machine learning to predict the exchange rate and employ interpretability methods to accurately analyze the relationships among macroeconomic variables. Moreover, we implement an ablation study based on the output of the interpretations to improve the predictive accuracy of the models. Our empirical results show that crude oil, as Canada's main commodity export, is the leading factor that determines the exchange rate dynamics with time-varying effects. The changes in the sign and magnitude of the contributions of crude oil to the exchange rate are consistent with significant events in the commodity and energy markets and the evolution of the crude oil trend in Canada. Gold and the TSX stock index are found to be the second and third most important variables that influence the exchange rate. Accordingly, this analysis provides trustworthy and practical insights for policymakers and economists and accurate knowledge about the predictive model's decisions, which are supported by theoretical considerations.

2.1AIAug 4, 2023

Assessing the impact of emergency department short stay units using length-of-stay prediction and discrete event simulation

Mucahit Cevik, Can Kavaklioglu, Fahad Razak et al.

Accurately predicting hospital length-of-stay at the time a patient is admitted to hospital may help guide clinical decision making and resource allocation. In this study we aim to build a decision support system that predicts hospital length-of-stay for patients admitted to general internal medicine from the emergency department. We conduct an exploratory data analysis and employ feature selection methods to identify the attributes that result in the best predictive performance. We also develop a discrete-event simulation model to assess the performances of the prediction models in a practical setting. Our results show that the recommendation performances of the proposed approaches are generally acceptable and do not benefit from the feature selection. Further, the results indicate that hospital length-of-stay could be predicted with reasonable accuracy (e.g., AUC value for classifying short and long stay patients is 0.69) using patient admission demographics, laboratory test results, diagnostic imaging, vital signs and clinical documentation.

4.3SENov 2, 2022Code

ADPTriage: Approximate Dynamic Programming for Bug Triage

Hadi Jahanshahi, Mucahit Cevik, Kianoush Mousavi et al.

Bug triaging is a critical task in any software development project. It entails triagers going over a list of open bugs, deciding whether each is required to be addressed, and, if so, which developer should fix it. However, the manual bug assignment in issue tracking systems (ITS) offers only a limited solution and might easily fail when triagers must handle a large number of bug reports. During the automated assignment, there are multiple sources of uncertainties in the ITS, which should be addressed meticulously. In this study, we develop a Markov decision process (MDP) model for an online bug triage task. In addition to an optimization-based myopic technique, we provide an ADP-based bug triage solution, called ADPTriage, which has the ability to reflect the downstream uncertainty in the bug arrivals and developers' timetables. Specifically, without placing any limits on the underlying stochastic process, this technique enables real-time decision-making on bug assignments while taking into consideration developers' expertise, bug type, and bug fixing time. Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time. We also demonstrate the empirical convergence of the model and conduct sensitivity analysis with various model parameters. Accordingly, this work constitutes a significant step forward in addressing the uncertainty in bug triage solutions

2.0LGJan 2, 2023

A Concurrent CNN-RNN Approach for Multi-Step Wind Power Forecasting

Syed Kazmi, Berk Gorgulu, Mucahit Cevik et al.

Wind power forecasting helps with the planning for the power systems by contributing to having a higher level of certainty in decision-making. Due to the randomness inherent to meteorological events (e.g., wind speeds), making highly accurate long-term predictions for wind power can be extremely difficult. One approach to remedy this challenge is to utilize weather information from multiple points across a geographical grid to obtain a holistic view of the wind patterns, along with temporal information from the previous power outputs of the wind farms. Our proposed CNN-RNN architecture combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to extract spatial and temporal information from multi-dimensional input data to make day-ahead predictions. In this regard, our method incorporates an ultra-wide learning view, combining data from multiple numerical weather prediction models, wind farms, and geographical locations. Additionally, we experiment with global forecasting approaches to understand the impact of training the same model over the datasets obtained from multiple different wind farms, and we employ a method where spatial information extracted from convolutional layers is passed to a tree ensemble (e.g., Light Gradient Boosting Machine (LGBM)) instead of fully connected layers. The results show that our proposed CNN-RNN architecture outperforms other models such as LGBM, Extra Tree regressor and linear regression when trained globally, but fails to replicate such performance when trained individually on each farm. We also observe that passing the spatial information from CNN to LGBM improves its performance, providing further evidence of CNN's spatial feature extraction capabilities.

3.3LGApr 18, 2022

Time Series Clustering for Grouping Products Based on Price and Sales Patterns

Aysun Bozanta, Sean Berry, Mucahit Cevik et al.

Developing technology and changing lifestyles have made online grocery delivery applications an indispensable part of urban life. Since the beginning of the COVID-19 pandemic, the demand for such applications has dramatically increased, creating new competitors that disrupt the market. An increasing level of competition might prompt companies to frequently restructure their marketing and product pricing strategies. Therefore, identifying the change patterns in product prices and sales volumes would provide a competitive advantage for the companies in the marketplace. In this paper, we investigate alternative clustering methodologies to group the products based on the price patterns and sales volumes. We propose a novel distance metric that takes into account how product prices and sales move together rather than calculating the distance using numerical values. We compare our approach with traditional clustering algorithms, which typically rely on generic distance metrics such as Euclidean distance, and image clustering approaches that aim to group data by capturing its visual patterns. We evaluate the performances of different clustering algorithms using our custom evaluation metric as well as Calinski Harabasz and Davies Bouldin indices, which are commonly used internal validity metrics. We conduct our numerical study using a propriety price dataset from an online food and grocery delivery company, and the publicly available Favorita sales dataset. We find that our proposed clustering approach and image clustering both perform well for finding the products with similar price and sales patterns within large datasets.

3.8LGJan 5, 2023

DANLIP: Deep Autoregressive Networks for Locally Interpretable Probabilistic Forecasting

Ozan Ozyegen, Juyoung Wang, Mucahit Cevik

Despite the high performance of neural network-based time series forecasting methods, the inherent challenge in explaining their predictions has limited their applicability in certain application areas. Due to the difficulty in identifying causal relationships between the input and output of such black-box methods, they rarely have been adopted in domains such as legal and medical fields in which the reliability and interpretability of the results can be essential. In this paper, we propose \model, a novel deep learning-based probabilistic time series forecasting architecture that is intrinsically interpretable. We conduct experiments with multiple datasets and performance metrics and empirically show that our model is not only interpretable but also provides comparable performance to state-of-the-art probabilistic time series forecasting methods. Furthermore, we demonstrate that interpreting the parameters of the stochastic processes of interest can provide useful insights into several application areas.

6.6IVJul 13, 2022

Improved $α$-GAN architecture for generating 3D connected volumes with an application to radiosurgery treatment planning

Sanaz Mohammadjafari, Mucahit Cevik, Ayse Basar

Generative Adversarial Networks (GANs) have gained significant attention in several computer vision tasks for generating high-quality synthetic data. Various medical applications including diagnostic imaging and radiation therapy can benefit greatly from synthetic data generation due to data scarcity in the domain. However, medical image data is typically kept in 3D space, and generative models suffer from the curse of dimensionality issues in generating such synthetic data. In this paper, we investigate the potential of GANs for generating connected 3D volumes. We propose an improved version of 3D $α$-GAN by incorporating various architectural enhancements. On a synthetic dataset of connected 3D spheres and ellipsoids, our model can generate fully connected 3D shapes with similar geometrical characteristics to that of training data. We also show that our 3D GAN model can successfully generate high-quality 3D tumor volumes and associated treatment specifications (e.g., isocenter locations). Similar moment invariants to the training data as well as fully connected 3D shapes confirm that improved 3D $α$-GAN implicitly learns the training data distribution, and generates realistic-looking samples. The capability of improved 3D $α$-GAN makes it a valuable source for generating synthetic medical image data that can help future research in this domain.

4.6LGAug 1, 2022

Interpretable Time Series Clustering Using Local Explanations

Ozan Ozyegen, Nicholas Prayogo, Mucahit Cevik et al.

This study focuses on exploring the use of local interpretability methods for explaining time series clustering models. Many of the state-of-the-art clustering models are not directly explainable. To provide explanations for these clustering algorithms, we train classification models to estimate the cluster labels. Then, we use interpretability methods to explain the decisions of the classification models. The explanations are used to obtain insights into the clustering models. We perform a detailed numerical study to test the proposed approach on multiple datasets, clustering models, and classification models. The analysis of the results shows that the proposed approach can be used to explain time series clustering models, specifically when the underlying classification model is accurate. Lastly, we provide a detailed analysis of the results, discussing how our approach can be used in a real-life scenario.

4.4OCNov 21, 2023

Neural Approximate Dynamic Programming for the Ultra-fast Order Dispatching Problem

Arash Dehghan, Mucahit Cevik, Merve Bodur

Same-Day Delivery (SDD) services aim to maximize the fulfillment of online orders while minimizing delivery delays but are beset by operational uncertainties such as those in order volumes and courier planning. Our work aims to enhance the operational efficiency of SDD by focusing on the ultra-fast Order Dispatching Problem (ODP), which involves matching and dispatching orders to couriers within a centralized warehouse setting, and completing the delivery within a strict timeline (e.g., within minutes). We introduce important extensions to ultra-fast ODP such as order batching and explicit courier assignments to provide a more realistic representation of dispatching operations and improve delivery efficiency. As a solution method, we primarily focus on NeurADP, a methodology that combines Approximate Dynamic Programming (ADP) and Deep Reinforcement Learning (DRL), and our work constitutes the first application of NeurADP outside of the ride-pool matching problem. NeurADP is particularly suitable for ultra-fast ODP as it addresses complex one-to-many matching and routing intricacies through a neural network-based VFA that captures high-dimensional problem dynamics without requiring manual feature engineering as in generic ADP methods. We test our proposed approach using four distinct realistic datasets tailored for ODP and compare the performance of NeurADP against myopic and DRL baselines by also making use of non-trivial bounds to assess the quality of the policies. Our numerical results indicate that the inclusion of order batching and courier queues enhances the efficiency of delivery operations and that NeurADP significantly outperforms other methods. Detailed sensitivity analysis with important parameters confirms the robustness of NeurADP under different scenarios, including variations in courier numbers, spatial setup, vehicle capacity, and permitted delay time.

0.3CLOct 5, 2022

Token Classification for Disambiguating Medical Abbreviations

Mucahit Cevik, Sanaz Mohammad Jafari, Mitchell Myers et al.

Abbreviations are unavoidable yet critical parts of the medical text. Using abbreviations, especially in clinical patient notes, can save time and space, protect sensitive information, and help avoid repetitions. However, most abbreviations might have multiple senses, and the lack of a standardized mapping system makes disambiguating abbreviations a difficult and time-consuming task. The main objective of this study is to examine the feasibility of token classification methods for medical abbreviation disambiguation. Specifically, we explore the capability of token classification methods to deal with multiple unique abbreviations in a single text. We use two public datasets to compare and contrast the performance of several transformer models pre-trained on different scientific and medical corpora. Our proposed token classification approach outperforms the more commonly used text classification models for the abbreviation disambiguation task. In particular, the SciBERT model shows a strong performance for both token and text classification tasks over the two considered datasets. Furthermore, we find that abbreviation disambiguation performance for the text classification models becomes comparable to that of token classification only when postprocessing is applied to their predictions, which involves filtering possible labels for an abbreviation based on the training data.

2.5AIJun 28, 2022

Linear programming-based solution methods for constrained partially observable Markov decision processes

Robert K. Helmeczi, Can Kavaklioglu, Mucahit Cevik

Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena. However, they are notoriously difficult to solve to optimality, and there exist only a few approximation methods for obtaining high-quality solutions. In this study, grid-based approximations are used in combination with linear programming (LP) models to generate approximate policies for CPOMDPs. A detailed numerical study is conducted with six CPOMDP problem instances considering both their finite and infinite horizon formulations. The quality of approximation algorithms for solving unconstrained POMDP problems is established through a comparative analysis with exact solution methods. Then, the performance of the LP-based CPOMDP solution approaches for varying budget levels is evaluated. Finally, the flexibility of LP-based approaches is demonstrated by applying deterministic policy constraints, and a detailed investigation into their impact on rewards and CPU run time is provided. For most of the finite horizon problems, deterministic policy constraints are found to have little impact on expected reward, but they introduce a significant increase to CPU run time. For infinite horizon problems, the reverse is observed: deterministic policies tend to yield lower expected total rewards than their stochastic counterparts, but the impact of deterministic constraints on CPU run time is negligible in this case. Overall, these results demonstrate that LP models can effectively generate approximate policies for both finite and infinite horizon problems while providing the flexibility to incorporate various additional constraints into the underlying model.

2.5AIJun 10, 2022

A multi-objective constrained POMDP model for breast cancer screening

Robert K. Helmeczi, Can Kavaklioglu, Mucahit Cevik et al.

Breast cancer is a common and deadly disease, but it is often curable when diagnosed early. While most countries have large-scale screening programs, there is no consensus on a single globally accepted guideline for breast cancer screening. The complex nature of the disease; the limited availability of screening methods such as mammography, magnetic resonance imaging (MRI), and ultrasound; and public health policies all factor into the development of screening policies. Resource availability concerns necessitate the design of policies which conform to a budget, a problem which can be modelled as a constrained partially observable Markov decision process (CPOMDP). In this study, we propose a multi-objective CPOMDP model for breast cancer screening which allows for supplemental screening methods to accompany mammography. The model has two objectives: maximize the quality-adjusted life years (QALYs) and minimize lifetime breast cancer mortality risk (LBCMR). We identify the Pareto frontier of optimal solutions for average and high-risk patients at different budget levels, which can be used by decision-makers to set policies in practice. We find that the policies obtained by using a weighted objective are able to generate well-balanced QALYs and LBCMR values. In contrast, the single-objective models generally sacrifice a substantial amount in terms of QALYs/LBCMR for a minimal gain in LBCMR/QALYs. Additionally, our results show that, with the baseline cost values for supplemental screenings as well as the additional disutility that they incur, they are rarely recommended in CPOMDP policies, especially in a budget-constrained setting. A sensitivity analysis reveals the thresholds on cost and disutility values at which supplemental screenings become advantageous to prescribe.

8.9SENov 10, 2020Code

Wayback Machine: A tool to capture the evolutionary behaviour of the bug reports and their triage process in open-source software systems

Hadi Jahanshahi, Mucahit Cevik, José Navas-Sú et al.

The issue tracking system (ITS) is a rich data source for data-driven decision-making. Different characteristics of bugs, such as severity, priority, and time to fix, provide a clear picture of an ITS. Nevertheless, such information may be misleading. For example, the exact time and the effort spent on a bug might be significantly different from the actual reporting time and the fixing time. Similarly, these values may be subjective, e.g., severity and priority values are assigned based on the intuition of a user or a developer rather than a structured and well-defined procedure. Hence, we explore the evolution of the bug dependency graph together with priority and severity levels to explore the actual triage process. Inspired by the idea of the "Wayback Machine" for the World Wide Web, we aim to reconstruct the historical decisions made in the ITS. Therefore, any bug prioritization or bug triage algorithms/scenarios can be applied in the same environment using our proposed ITS Wayback Machine. More importantly, we track the evolutionary metrics in the ITS when a custom triage/prioritization strategy is employed. We test the efficiency of the proposed algorithm using data extracted from three open-source projects. Our empirical study sheds light on the overlooked evolutionary metrics--e.g., overdue bugs and developers' loads--which are facilitated via our proposed past-event re-generator.

2.4OCDec 26, 2023

Dynamic AGV Task Allocation in Intelligent Warehouses

Arash Dehghan, Mucahit Cevik, Merve Bodur

This paper explores the integration of Automated Guided Vehicles (AGVs) in warehouse order picking, a crucial and cost-intensive aspect of warehouse operations. The booming AGV industry, accelerated by the COVID-19 pandemic, is witnessing widespread adoption due to its efficiency, reliability, and cost-effectiveness in automating warehouse tasks. This paper focuses on enhancing the picker-to-parts system, prevalent in small to medium-sized warehouses, through the strategic use of AGVs. We discuss the benefits and applications of AGVs in various warehouse tasks, highlighting their transformative potential in improving operational efficiency. We examine the deployment of AGVs by leading companies in the industry, showcasing their varied functionalities in warehouse management. Addressing the gap in research on optimizing operational performance in hybrid environments where humans and AGVs coexist, our study delves into a dynamic picker-to-parts warehouse scenario. We propose a novel approach Neural Approximate Dynamic Programming approach for coordinating a mixed team of human and AGV workers, aiming to maximize order throughput and operational efficiency. This involves innovative solutions for non-myopic decision making, order batching, and battery management. We also discuss the integration of advanced robotics technology in automating the complete order-picking process. Through a comprehensive numerical study, our work offers valuable insights for managing a heterogeneous workforce in a hybrid warehouse setting, contributing significantly to the field of warehouse automation and logistics.

3.3AIJul 2, 2025

Joint Matching and Pricing for Crowd-shipping with In-store Customers

Arash Dehghan, Mucahit Cevik, Merve Bodur et al.

This paper examines the use of in-store customers as delivery couriers in a centralized crowd-shipping system, targeting the growing need for efficient last-mile delivery in urban areas. We consider a brick-and-mortar retail setting where shoppers are offered compensation to deliver time-sensitive online orders. To manage this process, we propose a Markov Decision Process (MDP) model that captures key uncertainties, including the stochastic arrival of orders and crowd-shippers, and the probabilistic acceptance of delivery offers. Our solution approach integrates Neural Approximate Dynamic Programming (NeurADP) for adaptive order-to-shopper assignment with a Deep Double Q-Network (DDQN) for dynamic pricing. This joint optimization strategy enables multi-drop routing and accounts for offer acceptance uncertainty, aligning more closely with real-world operations. Experimental results demonstrate that the integrated NeurADP + DDQN policy achieves notable improvements in delivery cost efficiency, with up to 6.7\% savings over NeurADP with fixed pricing and approximately 18\% over myopic baselines. We also show that allowing flexible delivery delays and enabling multi-destination routing further reduces operational costs by 8\% and 17\%, respectively. These findings underscore the advantages of dynamic, forward-looking policies in crowd-shipping systems and offer practical guidance for urban logistics operators.

5.5SEMay 16, 2023

Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs

Garima Malik, Mucahit Cevik, Ayşe Başar

This paper explores the use of text data augmentation techniques to enhance conflict and duplicate detection in software engineering tasks through sentence pair classification. The study adapts generic augmentation techniques such as shuffling, back translation, and paraphrasing and proposes new data augmentation techniques such as Noun-Verb Substitution, target-lemma replacement and Actor-Action Substitution for software requirement texts. A comprehensive empirical analysis is conducted on six software text datasets to identify conflicts and duplicates among sentence pairs. The results demonstrate that data augmentation techniques have a significant impact on the performance of all software pair text datasets. On the other hand, in cases where the datasets are relatively balanced, the use of augmentation techniques may result in a negative effect on the classification performance.

1.6LGSep 5, 2021

VARGAN: Variance Enforcing Network Enhanced GAN

Sanaz Mohammadjafari, Mucahit Cevik, Ayse Basar

Generative adversarial networks (GANs) are one of the most widely used generative models. GANs can learn complex multi-modal distributions, and generate real-like samples. Despite the major success of GANs in generating synthetic data, they might suffer from unstable training process, and mode collapse. In this paper, we introduce a new GAN architecture called variance enforcing GAN (VARGAN), which incorporates a third network to introduce diversity in the generated samples. The third network measures the diversity of the generated samples, which is used to penalize the generator's loss for low diversity samples. The network is trained on the available training data and undesired distributions with limited modality. On a set of synthetic and real-world image data, VARGAN generates a more diverse set of samples compared to the recent state-of-the-art models. High diversity and low computational complexity, as well as fast convergence, make VARGAN a promising model to alleviate mode collapse.

3.6IRSep 2, 2021

Text Classification for Predicting Multi-level Product Categories

Hadi Jahanshahi, Ozan Ozyegen, Mucahit Cevik et al.

In an online shopping platform, a detailed classification of the products facilitates user navigation. It also helps online retailers keep track of the price fluctuations in a certain industry or special discounts on a specific product category. Moreover, an automated classification system may help to pinpoint incorrect or subjective categories suggested by an operator. In this study, we focus on product title classification of the grocery products. We perform a comprehensive comparison of six different text classification models to establish a strong baseline for this task, which involves testing both traditional and recent machine learning methods. In our experiments, we investigate the generalizability of the trained models to the products of other online retailers, the dynamic masking of infeasible subcategories for pretrained language models, and the benefits of incorporating product titles in multiple languages. Our numerical results indicate that dynamic masking of subcategories is effective in improving prediction accuracy. In addition, we observe that using bilingual product titles is generally beneficial, and neural network-based models perform significantly better than SVM and XGBoost models. Lastly, we investigate the reasons for the misclassified products and propose future research directions to further enhance the prediction models.

6.5LGMay 21, 2021

Word-level Text Highlighting of Medical Texts for Telehealth Services

Ozan Ozyegen, Devika Kabe, Mucahit Cevik

The medical domain is often subject to information overload. The digitization of healthcare, constant updates to online medical repositories, and increasing availability of biomedical datasets make it challenging to effectively analyze the data. This creates additional work for medical professionals who are heavily dependent on medical data to complete their research and consult their patients. This paper aims to show how different text highlighting techniques can capture relevant medical context. This would reduce the doctors' cognitive load and response time to patients by facilitating them in making faster decisions, thus improving the overall quality of online medical services. Three different word-level text highlighting methodologies are implemented and evaluated. The first method uses TF-IDF scores directly to highlight important parts of the text. The second method is a combination of TF-IDF scores and the application of Local Interpretable Model-Agnostic Explanations to classification models. The third method uses neural networks directly to make predictions on whether or not a word should be highlighted. The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms and its performance is improved as the size of the input segment increases.

1.0CLApr 26, 2021

Auto Response Generation in Online Medical Chat Services

Hadi Jahanshahi, Syed Kazmi, Mucahit Cevik

Telehealth helps to facilitate access to medical professionals by enabling remote medical services for the patients. These services have become gradually popular over the years with the advent of necessary technological infrastructure. The benefits of telehealth have been even more apparent since the beginning of the COVID-19 crisis, as people have become less inclined to visit doctors in person during the pandemic. In this paper, we focus on facilitating the chat sessions between a doctor and a patient. We note that the quality and efficiency of the chat experience can be critical as the demand for telehealth services increases. Accordingly, we develop a smart auto-response generation mechanism for medical conversations that helps doctors respond to consultation requests efficiently, particularly during busy sessions. We explore over 900,000 anonymous, historical online messages between doctors and patients collected over nine months. We implement clustering algorithms to identify the most frequent responses by doctors and manually label the data accordingly. We then train machine learning algorithms using this preprocessed data to generate the responses. The considered algorithm has two steps: a filtering (i.e., triggering) model to filter out infeasible patient messages and a response generator to suggest the top-3 doctor responses for the ones that successfully pass the triggering phase. The method provides an accuracy of 83.28\% for precision@3 and shows robustness to its parameters.

7.5LGApr 24, 2021

A Deep Reinforcement Learning Approach for the Meal Delivery Problem

Hadi Jahanshahi, Aysun Bozanta, Mucahit Cevik et al.

We consider a meal delivery service fulfilling dynamic customer requests given a set of couriers over the course of a day. A courier's duty is to pick-up an order from a restaurant and deliver it to a customer. We model this service as a Markov decision process and use deep reinforcement learning as the solution approach. We experiment with the resulting policies on synthetic and real-world datasets and compare those with the baseline policies. We also examine the courier utilization for different numbers of couriers. In our analysis, we specifically focus on the impact of the limited available resources in the meal delivery problem. Furthermore, we investigate the effect of intelligent order rejection and re-positioning of the couriers. Our numerical experiments show that, by incorporating the geographical locations of the restaurants, customers, and the depot, our model significantly improves the overall service quality as characterized by the expected total reward and the delivery times. Our results present valuable insights on both the courier assignment process and the optimal number of couriers for different order frequencies on a given day. The proposed model also shows a robust performance under a variety of scenarios for real-world implementation.

6.4SEMar 5, 2021Code

Does chronology matter in JIT defect prediction? A Partial Replication Study

Hadi Jahanshahi, Dhanya Jothimani, Ayşe Başar et al.

Just-In-Time (JIT) models detect the fix-inducing changes (or defect-inducing changes). These models are designed based on the assumption that past code change properties are similar to future ones. However, as the system evolves, the expertise of developers and/or the complexity of the system also changes. In this work, we aim to investigate the effect of code change properties on JIT models over time. We also study the impact of using recent data as well as all available data on the performance of JIT models. Further, we analyze the effect of weighted sampling on the performance of fix-inducing properties of JIT models. For this purpose, we used datasets from Eclipse JDT, Mozilla, Eclipse Platform, and PostgreSQL. We used five families of change-code properties such as size, diffusion, history, experience, and purpose. We used Random Forest to train and test the JIT model and Brier Score and the area under the ROC curve for performance measurement. Our paper suggests that the predictive power of JIT models does not change over time. Furthermore, we observed that the chronology of data in JIT defect prediction models can be discarded by considering all the available data. On the other hand, the importance score of families of code change properties is found to oscillate over time. To mitigate the impact of the evolution of code change properties, it is recommended to use a weighted sampling approach in which more emphasis is placed upon the changes occurring closer to the current time. Moreover, since properties such as "Expertise of the Developer" and "Size" evolve with time, the models obtained from old data may exhibit different characteristics compared to those employing the newer dataset. Hence, practitioners should constantly retrain JIT models to include fresh data.

3.6SEMar 5, 2021

Moving from Cross-Project Defect Prediction to Heterogeneous Defect Prediction: A Partial Replication Study

Hadi Jahanshahi, Mucahit Cevik, Ayşe Başar

Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a project or across different projects. However, techniques applied and conclusions derived by those models are restricted by how identical those metrics are. Knowledge coming from those models will not be extensible to a target project if no sufficient overlapping metrics have been collected in the source projects. To explore the feasibility of transferring knowledge across projects without common labeled metrics, we systematically integrated Heterogeneous Defect Prediction (HDP) by replicating and validating the obtained results. Our main goal is to extend prior research and explore the feasibility of HDP and finally to compare its performance with that of its predecessor, Cross-Project Defect Prediction. We construct an HDP model on different publicly available datasets. Moreover, we propose a new ensemble voting approach in the HDP context to utilize the predictive power of multiple available datasets. The result of our experiment is comparable to that of the original study. However, we also explored the feasibility of HDP in real cases. Our results shed light on the infeasibility of many cases for the HDP algorithm due to its sensitivity to the parameter selection. In general, our analysis gives a deep insight into why and how to perform transfer learning from one domain to another, and in particular, provides a set of guidelines to help researchers and practitioners to disseminate knowledge to the defect prediction domain.

9.6LGSep 18, 2020

Explainable boosted linear regression for time series forecasting

Igor Ilic, Berk Gorgulu, Mucahit Cevik et al.

Time series forecasting involves collecting and analyzing past observations to develop a model to extrapolate such observations into the future. Forecasting of future events is important in many fields to support decision making as it contributes to reducing the future uncertainty. We propose explainable boosted linear regression (EBLR) algorithm for time series forecasting, which is an iterative method that starts with a base model, and explains the model's errors through regression trees. At each iteration, the path leading to highest error is added as a new variable to the base model. In this regard, our approach can be considered as an improvement over general time series models since it enables incorporating nonlinear features by residuals explanation. More importantly, use of the single rule that contributes to the error most allows for interpretable results. The proposed approach extends to probabilistic forecasting through generating prediction intervals based on the empirical error distribution. We conduct a detailed numerical study with EBLR and compare against various other approaches. We observe that EBLR substantially improves the base model performance through extracted features, and provide a comparable performance to other well established approaches. The interpretability of the model predictions and high predictive accuracy of EBLR makes it a promising method for time series forecasting.

5.8LGSep 18, 2020

Evaluation of Local Explanation Methods for Multivariate Time Series Forecasting

Ozan Ozyegen, Igor Ilic, Mucahit Cevik

Being able to interpret a machine learning model is a crucial task in many applications of machine learning. Specifically, local interpretability is important in determining why a model makes particular predictions. Despite the recent focus on AI interpretability, there has been a lack of research in local interpretability methods for time series forecasting while the few interpretable methods that exist mainly focus on time series classification tasks. In this study, we propose two novel evaluation metrics for time series forecasting: Area Over the Perturbation Curve for Regression and Ablation Percentage Threshold. These two metrics can measure the local fidelity of local explanation models. We extend the theoretical foundation to collect experimental results on two popular datasets, \textit{Rossmann sales} and \textit{electricity}. Both metrics enable a comprehensive comparison of numerous local explanation models and find which metrics are more sensitive. Lastly, we provide heuristical reasoning for this analysis.

5.0LGJun 16, 2020

An empirical study on using CNNs for fast radio signal prediction

Ozan Ozyegen, Sanaz Mohammadjafari, Karim El mokhtari et al.

Accurate radio frequency power prediction in a geographic region is a computationally expensive part of finding the optimal transmitter location using a ray tracing software. We empirically analyze the viability of deep learning models to speed up this process. Specifically, deep learning methods including CNNs and UNET are typically used for segmentation, and can also be employed in power prediction tasks. We consider a dataset that consists of radio frequency power values for five different regions with four different frame dimensions. We compare deep learning-based prediction models including RadioUNET and four different variations of the UNET model for the power prediction task. More complex UNET variations improve the model on higher resolution frames such as 256x256. However, using the same models on lower resolutions results in overfitting and simpler models perform better. Our detailed numerical analysis shows that the deep learning models are effective in power prediction and they are able to generalize well to the new regions.