SEJul 10, 2023Code
COMEX: A Tool for Generating Customized Source Code RepresentationsDebeshee Das, Noble Saji Mathews, Alex Mathai et al.
Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, valid source code, unlike natural language, follows a strict structure and pattern governed by the underlying grammar of the programming language. Current LLMs do not exploit this property of the source code as they treat code like a sequence of tokens and overlook key structural and semantic properties of code that can be extracted from code-views like the Control Flow Graph (CFG), Data Flow Graph (DFG), Abstract Syntax Tree (AST), etc. Unfortunately, the process of generating and integrating code-views for every programming language is cumbersome and time consuming. To overcome this barrier, we propose our tool COMEX - a framework that allows researchers and developers to create and combine multiple code-views which can be used by machine learning (ML) models for various SE tasks. Some salient features of our tool are: (i) it works directly on source code (which need not be compilable), (ii) it currently supports Java and C#, (iii) it can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural analysis, and (iv) it is easily extendable to other languages as it is built on tree-sitter - a widely used incremental parser that supports over 40 languages. We believe this easy-to-use code-view generation and customization tool will give impetus to research in source code representation learning methods and ML4SE. Tool: https://pypi.org/project/comex - GitHub: https://github.com/IBM/tree-sitter-codeviews - Demo: https://youtu.be/GER6U87FVbU
HCJul 14, 2021Code
WAccess -- A Web Accessibility Tool based on WCAG 2.2, 2.1 and 2.0 GuidelinesKowndinya Boyalakuntla, Akhila Sri Manasa Venigalla, Sridhar Chimalakonda
The vision of providing access to all web content equally for all users makes web accessibility a fundamental goal of today's internet. Web accessibility is the practice of removing barriers from websites that could hinder functionality for users with various disabilities. Web accessibility is measured against the accessibility guidelines such as WCAG, GIGW, and so on. WCAG 2.2 is the latest set of guidelines for web accessibility that helps in making websites accessible. The web accessibility tools available in the World Wide Web Consortium (W3C), only conform up to WCAG 2.1 guidelines, while no tools exist for the latest set of guidelines. Despite the availability of several tools to check the conformity of websites with WCAG 2.1 guidelines, there is a scarcity of tools that are both open source and scalable. To support automated accessibility evaluation of numerous websites against WCAG 2.2, 2.1, and 2.0 we present a tool, WAccess. WAccess highlights violations of 13 guidelines from WCAG 2.0, 9 guidelines from WCAG 2.1, and 7 guidelines from WCAG 2.2 of a specific web page on the web console and suggests the fix for violations while specifying violating code snippet simultaneously. We evaluated WAccess against 2227 government websites of India and observed a total of about 6.1 million violations.
SEJul 8, 2021Code
GitQ- Towards Using Badges as Visual Cues for GitHub ProjectsAkhila Sri Manasa Venigalla, Kowndinya Boyalakunta, Sridhar Chimalakonda
GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However, developers willing to contribute to projects on GitHub often find it challenging to select appropriate projects to contribute to or reuse due to the large number of repositories present on GitHub. Further, obtaining this required information often becomes a tedious process, as one has to carefully mine information hidden inside the repository. To alleviate the effort intensive mining procedures, researchers have proposed npm-badges to outline information relating to build status of a project. However, these badges are static and limit their usage to package dependency and build details. Adding visual cues such as badges to the repositories might reduce the search space for developers. Hence, we present GitQ, to automatically augment GitHub repositories with badges representing information about source code and project maintenance. Presenting GitQ as a browser plugin to GitHub could make it easily accessible to developers using GitHub. GitQ is evaluated with 15 developers based on the UTAUT model to understand developer perception towards its usefulness. We observed that 11 out of 15 developers perceived GitQ to be useful in identifying the right set of repositories using visual cues such as generated by GitQ. The source code and tool are available for download on GitHub at https://github.com/gitq-for-github/plugin, and the demo can be found at https://youtu.be/c0yohmIat3A.
SEJul 6, 2021Code
SOCluster- Towards Intent-based Clustering of Stack Overflow Questions using Graph-Based ApproachAbhishek Kumar, Deep Ghadiyali, Sridhar Chimalakonda
Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising. This issue is common across various community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. Clustering is one of the approaches used by these communities to address this challenge. Specifically, Intent-based clustering could be leveraged to answer unanswered questions using other answered questions in the same cluster and can also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions based on intent using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed intent-based clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best value for the three evaluation metrics on all four datasets. The source code and tool are available for download on Github at: https://github.com/Liveitabhi/SOCluster, and the demo can be found here: https://youtu.be/uyn8ie4h3NY.
SEMar 2, 2021Code
Apples, Oranges & Fruits -- Understanding Similarity of Software Projects Through The Lens of Dissimilar ArtifactsA Eashaan Rao, Sridhar Chimalakonda
The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements or issues. Existing approaches tend to find similar projects by comparing similar artifacts such as source-code to source-code, API usage to API usage, documentation to documentation, and so on. Even though there is a dissimilarity between two similar artifacts, there could be a similarity between two dissimilar artifacts. Hence, in this paper, we aim to answer the question - Can we find similarity of software repositories through dissimilar artifacts?. To this end, we conduct an experiment to find similarities between three repositories, two similar and one different project comparing similar and dissimilar artifacts (documentation, commits, and source-code). We observed similarities between dissimilar artifacts such as Commits, Source Code, and Readme Files in the context of both similar and different repositories.
SEMar 1, 2021Code
Understanding Emotions of Developer Community Towards Software DocumentationAkhila Sri Manasa Venigalla, Sridhar Chimalakonda
The availability of open-source projects facilitates developers to contribute and collaborate on a wide range of projects. As a result, the developer community contributing to such open-source projects is also increasing. Many of the projects involve frequent updates and extensive reuses. A well-updated documentation helps in a better understanding of the software project and also facilitates efficient contribution and reuse. Though software documentation plays an important role in the development and maintenance of software, it also suffers from various issues that include insufficiency, inconsistency, ill-maintainability, and so on. Exploring the perception of developers towards documentation could help in understanding the reasons behind prevalent issues in software documentation. It could further aid in deciding on training that could be given to the developer community towards building more sustainable projects for society. Analyzing sentiments of contributors to a project could provide insights on understanding developer perceptions. Hence, as the first step towards this direction, we analyze sentiments of commit messages specific to the documentation of a software project. To this end, we considered the commit history of 998 GitHub projects from the GHTorrent dataset and identified 10,996 commits that correspond to the documentation of repositories. Further, we apply sentiment analysis techniques to obtain insights on the type of sentiment being expressed in commit messages of the selected commits. We observe that around 45% of the identified commit messages express trust emotion.
SEDec 21, 2020Code
AC2 -- Towards Understanding Architectural Changes in Rapid ReleasesA Eashaan Rao, Dheeraj Vagavolu, Sridhar Chimalakonda
Open source projects are adopting faster release cycles that reflect various changes. Therefore, comprehending the effects of these changes on software's architecture over the releases becomes necessary. However, it is challenging to keep architecture in-check and add new changes simultaneously for every release. To this end, we propose a visualization tool called AC2, which allows its users to examine the alterations in the architecture at both higher and lower levels of abstraction for the python projects. AC2 uses call graphs and collaboration graphs to show the interaction between different architectural components. The tool provides four different views to see the architectural changes. The user can examine two releases at a time to comprehend the architectural changes between the releases. AC2 can support the maintainers and developers to observe changes in the project and its influence on the architecture, which allow them to see its increasing complexity over the releases at the component level. AC2 can be downloaded at https://github.com/dheerajrox/AC2 and its demo can be seen at the website https://dheerajrox.github.io/AC2doc or on youtube https://www.youtube.com/watch?v=GNrJfZ0RCVI
SENov 6, 2020Code
DRAST -- A Deep Learning and AST Based Approach for Bug LocalizationShubham Sangle, Sandeep Muvva, Sridhar Chimalakonda et al.
Context: Given a bug report and source code of the project, bug localization can help developers to focus on fixing probable buggy files rather than searching the entire source code repository. While existing research uses information retrieval (IR) and/or combination of machine learning (ML) or deep learning (DL) approaches, they focus primarily on benchmark Java projects, and also motivate the need for multi-language bug localization approach. Objective: To create a novel bug localization approach that leverages the syntactic structure of source code, bug report information and which can support multi-language projects along with a new dataset of C projects. Method: The proposed DRAST approach represents source code as code vectors by using its high-level AST and combines rVSM, an IR technique with ML/DL models such as Random Forest and Deep Neural Network regressor to rank the list of buggy files. We also use features such as textual similarity using IR techniques, lexical mismatch using DNNs, and history of the project using the metadata of BugC dataset. Results: We tested DRAST on seven projects from the BugC dataset, which consists of 2462 bug reports from 21 open-source C projects. The results show that DRAST can locate correct buggy files 90% of the time from top 1, 5, and 10 suggested files with MAP and MRR scores of above 90% for the randomly selected seven projects. We also tested DRAST on Tomcat and AspectJ, projects from benchmark dataset with better results at accuracy@1, MAP and MRR when compared with state-of-the-art. Conclusions: This paper presents a novel bug localization approach that works on C and Java projects and a bug localization C dataset along with a novel source code representation. The results for C projects using DRAST are promising and could motivate researchers/practitioners to focus on developing and creating multi-language bug localization approaches.
SEJun 23, 2020Code
A Catalogue of Game-Specific Software NuggetsVartika Agrahari, Sridhar Chimalakonda
With the ever-increasing use of games, game developers are expected to write efficient code supporting several qualities such as security, maintainability, and performance. However, the continuous need to update the features of games in less duration might compel the developers to use anti-patterns, code smells and quick-fix solutions that may affect the functional and non-functional requirements of the game. These bad practices may lead to technical debt, poor program comprehension, and can cause several issues during software maintenance. Here, in this paper, we introduce "Software Nuggets" as a concept that affects software quality in a negative way and as a superset of anti-patterns, code smells, bugs, software bad practices. We call these Software Nuggets as "G-Nuggets" in the context of games. While there exists empirical research on games, we are not aware of any work on understanding and cataloguing these G-Nuggets. Thus, we propose a catalogue of G-Nuggets by mining and analyzing 892 commits, 189 issues, and 104 pull requests from 100 open-source GitHub game repositories. We use regular expressions and thematic analysis on this dataset for cataloguing game-specific Software Nuggets. We present a catalogue of ten G-Nuggets and provide examples for them present online at: https://phoebs88.github.io/A-Catalogue-of-Game-Specific-Software-Nuggets. We believe this catalogue might be helpful for researchers for further empirical research in the domain of games as well as for game developers to improve quality of games.
SEApr 19, 2020Code
BuGL -- A Cross-Language Dataset for Bug LocalizationSandeep Muvva, A Eashaan Rao, Sridhar Chimalakonda
Bug Localization is the process of locating potential error-prone files or methods from a given bug report and source code. There is extensive research on bug localization in the literature that focuses on applying information retrieval techniques or machine learning/deep learning approaches or both, to detect location of bugs. The common premise for all approaches is the availability of a good dataset, which in this case, is the standard benchmark dataset that comprises of 6 Java projects and in some cases, more than 6 Java projects. The existing dataset do not comprise projects of other programming languages, despite of the need to investigate specific and cross project bug localization. To the best of our knowledge, we are not aware of any dataset that addresses this concern. In this paper, we present BuGL, a large-scale cross-language dataset. BuGL constitutes of more than 10,000 bug reports drawn from open-source projects written in four programming languages, namely C, C++, Java, and Python. The dataset consists of information which includes Bug Reports and Pull-Requests. BuGL aims to unfold new research opportunities in the area of bug localization.
SEJan 31, 2020Code
StackEmo-Towards Enhancing User Experience by Augmenting Stack Overflow with EmojisAkhila Sri Manasa Venigalla, Sridhar Chimalakonda
With the increase in acceptance of open source platforms for knowledge sharing, Question and Answer (Q\&A) websites such as Stack Overflow have become increasingly popular in the programming domain. Many novice programmers visit Stack Overflow for reasons that include posing questions, finding answers for issues they come across in the process of programming. Practitioners voluntarily answer questions on Stack Overflow based on their experience or prior knowledge. Most of these answers are also accompanied by comments from users of Stack Overflow. Questions, answers and comments on Stack Overflow also include sentiments of users, which when analysed and presented could motivate users in reading and contributing to the posts. However, the sentiment of these posts is not being depicted in the current Stack Overflow platform. There is extensive research on analysing sentiments on social networking platforms such as twitter. Representing sentiment of a post might motivate users to follow or answer certain posts. While there exist several tools that augment or annotate Stack Overflow platform for developers, we are not aware of tools that deal with sentiment of the posts. In this paper, we propose StackEmo as a Google Chrome plugin to augment comments on Stack Overflow with emojis, based on the sentiment of the comments posted, with the aim to provide users with visual cues that could motivate the users to review and contribute to available comments. We evaluated StackEmo through an in-user likert scale based survey with 30 university students. The results of the survey provided us insights on improving StackEmo, with 83% participants having recommended the plugin to their peers.
37.1SEMay 5
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement LearningP Akilesh, Leuson Da Silva, Foutse Khomh et al.
Static analysis tools are essential for ensuring memory safety in Rust programs, particularly as Rust gains adoption in safety-critical domains. However, existing tools such as Rudra and MirChecker suffer from high false positive rates, which diminish developer trust, increase manual review effort, and may obscure genuine vulnerabilities. This paper presents a novel reinforcement learning (RL)-based approach for automatically classifying and suppressing spurious warnings in static memory safety analysis for Rust. To achieve this, we design an RL agent that learns a warning suppression policy by extracting contextual features from Rust's Mid-level Intermediate Representation (MIR) and optimizing its decisions through interaction with static analysis outputs. To improve decision quality, we integrate dynamic validation via cargo-fuzz as an auxiliary feedback mechanism, allowing the agent to selectively validate suspicious warnings through targeted fuzz testing. Our evaluation shows that the proposed approach significantly outperforms state-of-the-art LLM-based baselines, achieving 65.2% accuracy and an F1 score of 0.659, an improvement of 17.1% over the best LLM baseline. With a recall of 74.6%, our method successfully identifies nearly three-quarters of true bugs while substantially reducing false positives, improving precision from 25.6% in raw Rudra output to 59.0%. Incorporating dynamic fuzzing further boosts performance, yielding additional improvements of 10.7 percentage points in accuracy and 8.6 percentage points in F1 score over the RL-only variant. Overall, our work demonstrates that combining reinforcement learning with hybrid static-dynamic analysis can substantially reduce false positives and improve the practical usability of memory safety verification tools for Rust.
SENov 21, 2024
CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View GraphsAlex Mathai, Kranthi Sedamaki, Debeshee Das et al.
Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspired by natural language processing (NLP), have shown remarkable success in SE tasks. However, source code contains structural and semantic properties embedded within its grammar, which can be extracted from structured code-views like the Abstract Syntax Tree (AST), Data-Flow Graph (DFG), and Control-Flow Graph (CFG). These code-views can complement NLP techniques, further improving SE tasks. Unfortunately, there are no flexible frameworks to infuse arbitrary code-views into existing transformer-based models effectively. Therefore, in this work, we propose CodeSAM, a novel scalable framework to infuse multiple code-views into transformer-based models by creating self-attention masks. We use CodeSAM to fine-tune a small language model (SLM) like CodeBERT on the downstream SE tasks of semantic code search, code clone detection, and program classification. Experimental results show that by using this technique, we improve downstream performance when compared to SLMs like GraphCodeBERT and CodeBERT on all three tasks by utilizing individual code-views or a combination of code-views during fine-tuning. We believe that these results are indicative that techniques like CodeSAM can help create compact yet performant code SLMs that fit in resource constrained settings.
SEAug 24, 2025
Who Wins the Race? (R Vs Python) - An Exploratory Study on Energy Consumption of Machine Learning AlgorithmsRajrupa Chattaraj, Sridhar Chimalakonda, Vibhu Saujanya Sharma et al.
The utilization of Machine Learning (ML) in contemporary software systems is extensive and continually expanding. However, its usage is energy-intensive, contributing to increased carbon emissions and demanding significant resources. While numerous studies examine the performance and accuracy of ML, only a limited few focus on its environmental aspects, particularly energy consumption. In addition, despite emerging efforts to compare energy consumption across various programming languages for specific algorithms and tasks, there remains a gap specifically in comparing these languages for ML-based tasks. This paper aims to raise awareness of the energy costs associated with employing different programming languages for ML model training and inference. Through this empirical study, we measure and compare the energy consumption along with run-time performance of five regression and five classification tasks implemented in Python and R, the two most popular programming languages in this context. Our study results reveal a statistically significant difference in costs between the two languages in 95% of the cases examined. Furthermore, our analysis demonstrates that the choice of programming language can influence energy efficiency significantly, up to 99.16% during model training and up to 99.8% during inferences, for a given ML task.
HCJul 13, 2021
ML-Quest: A Game for Introducing Machine Learning Concepts to K-12 StudentsShruti Priya, Shubhankar Bhadra, Sridhar Chimalakonda
Today, Machine Learning (ML) is of a great importance to society due to the availability of huge data and high computational resources. This ultimately led to the introduction of ML concepts at multiple levels of education including K-12 students to promote computational thinking. However, teaching these concepts to K-12 through traditional methodologies such as video lectures and books is challenging. Many studies in the literature have reported that using interactive environments such as games to teach computational thinking and programming improves retention capacity and motivation among students. Therefore, introducing ML concepts using a game might enhance students' understanding of the subject and motivate them to learn further. However, we are not aware of any existing game which explicitly focuses on introducing ML concepts to students using game play. Hence, in this paper, we propose ML-Quest, a 3D video game to provide conceptual overview of three ML concepts: Supervised Learning, Gradient Descent and K-Nearest Neighbor (KNN) Classification. The crux of the game is to introduce the definition and working of these concepts, which we call conceptual overview, in a simulated scenario without overwhelming students with the intricacies of ML. The game has been predominantly evaluated for its usefulness and player experience using the Technology Acceptance Model (TAM) model with the help of 23 higher-secondary school students. The survey result shows that around 70% of the participants either agree or strongly agree that the ML-Quest is quite interactive and useful in introducing them to ML concepts.
SEJul 6, 2021
COSPEX: A Program Comprehension Tool for Novice ProgrammersAshutosh Rajput, Nakshatra Gupta, Sridhar Chimalakonda
Developers often encounter unfamiliar code during software maintenance which consumes a significant amount of time for comprehension, especially for novice programmers. Automated techniques that analyze a source code and present key information to the developers can lead to an effective comprehension of the code. Researchers have come up with automated code summarization techniques that focus on code summarization by generating brief summaries rather than aiding its comprehension. Existing debuggers represent the execution states of the program but they do not show the complete execution at a single point. Studies have revealed that the effort required for program comprehension can be reduced if novice programmers are provided with worked examples. Hence, we propose COSPEX (Comprehension using Summarization via Program Execution) - an Atom plugin that dynamically extracts key information for every line of code executed and presents it to the developers in the form of an interactive example-like dynamic information instance. As a preliminary evaluation, we presented 14 undergraduates having Python programming experience up to 1 year with a code comprehension task in a user survey. We observed that COSPEX helped novice programmers in program comprehension and improved their understanding of the code execution. The source code and tool are available at: https://bit.ly/3utHOBM, and the demo on Youtube is available at: https://bit.ly/2Sp08xQ.
HCJun 22, 2021
MuseumViz -- Towards Visualizing Online Museum CollectionsDheeraj Vagavolu, Akhila Sri Manasa Venigalla, Sridhar Chimalakonda
Despite the growth of online museums for India's cultural heritage data, there is limited increase in terms of visitors. Over the years, online museums adopted many techniques to improve the overall user experience. However, many Indian online museums display artifacts as lists and grids with basic search functionality, making it less visually appealing and difficult to comprehend. Our work aims to enhance the user experience of accessing Indian online museums by utilizing advancements in information visualization. Hence, we propose MuseumViz, a framework which processes data from online museums and visualizes it using four different interactive visualizations: the Network Graph, TreepMap, Polygon Chart and SunBurst Chart. We demonstrate MuseumViz on a total of 723 cultural heritage artifacts present in the Archaeological Survey of India, Goa. Based on our evaluation with 25 users, about 83% of them find it easier and more comprehensible to browse cultural heritage artifacts through MuseumViz.
SEJun 21, 2021
On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical StudyKarthik Chandra Swarna, Noble Saji Mathews, Dheeraj Vagavolu et al.
Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs such as Control Flow Graph (CFG) and Program Dependency Graph (PDG), which contain information about source code that AST does not. Even though some works tried to utilize multiple representations, they do not provide any insights about the costs and benefits of using multiple representations. The primary goal of this paper is to discuss the implications of utilizing multiple code representations, specifically AST, CFG, and PDG. We modify an AST path-based approach to accept multiple representations as input to an attention-based model. We do this to measure the impact of additional representations (such as CFG and PDG) over AST. We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection. Our approach increases the performance on these tasks by 11% (F1), 15.7% (Accuracy), and 9.3% (F1), respectively, over the baseline. In addition to the effect on performance, we discuss timing overheads incurred with multiple representations. We envision this work providing researchers with a lens to evaluate combinations of code representations for various tasks.
CYJun 18, 2021
Detox Browser -- Towards Filtering Sensitive Content On the WebNoble Saji Mathews, Sridhar Chimalakonda
The annual consumption of web-based resources is increasing at a very fast rate, mainly due to an increase in affordability and accessibility of the internet. Many are relying on the web to get diverse perspectives, but at the same time, it can expose them to content that is harmful to their mental well-being. Catchy headlines and emotionally charged articles increase the number of readers which in turn increases ad revenue for websites. When a user consumes a large quantity of negative content, it adversely impacts the user's happiness and has a significant impact on his/her mood and state of mind. Many studies carried out during the COVID-19 pandemic has shown that people across the globe irrespective of their country of origin have experienced higher levels of anxiety and depression. Web filters can help in constructing a digital environment that is more suitable for people prone to depression, anxiety and stress. A significant amount of work has been done in the field of web filtering, but there has been limited focus on helping Highly Sensitive Persons (HSP's) or those with stress disorders induced by trauma. Through this paper, we propose detox Browser, a simple tool that enables end-users to tune out of or control their exposure to topics that can affect their mental well being. The extension makes use of sentiment analysis and keywords to filter out flagged content from google search results and warns users if any blacklisted topics are detected when navigating across websites
HCApr 17, 2021
SurviveCovid-19++ : A collaborative healthcare game towards educating people about safety measures and vaccination for Covid-19Akhila Sri Manasa Venigalla, Dheeraj Vagavolu, Sridhar Chimalakonda
Covid-19 has been affecting population across the world for more than an year, with diverse strains of this virus being identified in many countries. Vaccines to help in curbing the virus are being developed and administered. Preventing the spread of the disease requires collaborative efforts from everyone. People with varied professional backgrounds have varied responsibilities in controlling the pandemic. It is important that everyone is aware of their respective responsibilities and also empathize with efforts and duties of other individuals. It is here, we wish to leverage the potential of games in healthcare domain, towards educating about Covid-19. With an aim to educate the population about vaccination against Covid-19, responsibilities of citizens with varied professional backgrounds, and emphasize on the need for collaboration to fight against the pandemic, by following safety measures, we present SurviveCovid-19++, a collaborative multiplayer desktop based game. The game essentially revolves around four roles - doctor, sanitation worker, citizen and law enforcer, delivering their duties, following safety measures and collaboratively clearing multiple stages in the game. We have performed a preliminary evaluation of the game through a qualitative and quantitative user survey. The results of the user survey were encouraging, with volunteers expressing their increased empathy towards efforts of individuals with varied professional backgrounds, and better understanding of the importance of safety measures against Covid-19.
SEFeb 25, 2021
What's in a GitHub Repository? -- A Software Documentation PerspectiveAkhila Sri Manasa Venigalla, Sridhar Chimalakonda
Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a repository is diversified, among various files, with most of it present in ReadMe files. However, other software artifacts in the repository, such as issue reports and pull requests could also contribute to documentation, without documentation being explicitly specified. Hence, in this paper, we propose a taxonomy of documentation sources by analyzing different software artifacts, developer interviews and card-sorting approach. We inspected multiple artifacts of 950 public GitHub repositories, written in four different programming languages, C++, C#, Python and Java, and analyzed the type and amount of documentation that could be extracted from these artifacts. To this end, we observe that, about 25.93% of information extracted from all sources proposed in the taxonomy contains error-related documentation, and that pull requests contribute to around 18.21% of extracted information.
SEFeb 18, 2021
APIScanner -- Towards Automated Detection of Deprecated APIs in Python LibrariesAparna Vadlamani, Rishitha Kalicheti, Sridhar Chimalakonda
Python libraries are widely used for machine learning and scientific computing tasks today. APIs in Python libraries are deprecated due to feature enhancements and bug fixes in the same way as in other languages. These deprecated APIs are discouraged from being used in further software development. Manually detecting and replacing deprecated APIs is a tedious and time-consuming task due to the large number of API calls used in the projects. Moreover, the lack of proper documentation for these deprecated APIs makes the task challenging. To address this challenge, we propose an algorithm and a tool APIScanner that automatically detects deprecated APIs in Python libraries. This algorithm parses the source code of the libraries using abstract syntax tree (ASTs) and identifies the deprecated APIs via decorator, hard-coded warning or comments. APIScanner is a Visual Studio Code Extension that highlights and warns the developer on the use of deprecated API elements while writing the source code. The tool can help developers to avoid using deprecated API elements without the execution of code. We tested our algorithm and tool on six popular Python libraries, which detected 838 of 871 deprecated API elements. Demo of APIScanner: https://youtu.be/1hy_ugf-iek. Documentation, tool, and source code can be found here: https://rishitha957.github.io/APIScanner.
HCOct 13, 2020
EmoG- Towards Emojifying Gmail ConversationsAkhila Sri Manasa Venigalla, Sridhar Chimalakonda
Emails are one of the most frequently used medium of communication in the present day across multiple domains including industry and educational institutions. Understanding sentiments being expressed in an email could have a considerable impact on the recipients' action or response to the email. However, it is difficult to interpret emotions of the sender from pure text in which emotions are not explicitly present. Researchers have tried to predict customer attrition by integrating emails in client-company environment with emotions. However, most of the existing works deal with static assessment of email emotions. Presenting sentiments of emails dynamically to the reader could help in understanding senders' emotion and as well have an impact on readers' action. Hence, in this paper, we present EmoG as a Google Chrome Extension which is intended to support university students. It augments emails with emojis based on the sentiment being conveyed in the email, which might also offer faster overview of email sentiments and act as tags that could help in automatic sorting and processing of emails. Currently, EmoG has been developed to support Gmail inbox on a Google Chrome browser, and could be extended to other inboxes and browsers with ease. We have conducted a user survey with 15 university students to understand the usefulness of EmoG and received positive feedback.
CYJun 3, 2020
AiR -- An Augmented Reality Application for Visualizing Air PollutionNoble Saji Mathews, Sridhar Chimalakonda, Suresh Jain
Air quality is a term used to describe the concentration levels of various pollutants in the air we breathe. The air quality, which is degrading rapidly across the globe, has been a source of great concern. Across the globe, governments are taking various measures to reduce air pollution. Bringing awareness about environmental pollution among the public plays a major role in controlling air pollution, as the programs proposed by governments require the support of the public. Though information on air quality is present on multiple portals such as the Central Pollution Control Board (CPCB), which provides Air Quality Index that could be accessed by the public. However, such portals are scarcely visited by the general public. Visualizing air quality in the location where an individual resides could help in bringing awareness among the public. This visualization could be rendered using Augmented Reality techniques. Considering the widespread usage of Android based mobile devices in India, and the importance of air quality visualization, we present AiR, as an Android based mobile application. AiR considers the air quality measured by CPCB, in a locality that is detected by the user's GPS or in a locality of user's choice, and visualizes various air pollutants present in the locality $(PM_1{}_0, PM_2{}_.{}_5, NO_2, SO_2, CO, O_3 \& NH_3)$ and displays them in the user's surroundings. AiR also creates awareness in an interactive manner about the different pollutants, sources, and their impacts on health.
HCApr 21, 2020
SurviveCovid-19 -- An Educational Game to Facilitate Habituation of Social Distancing and Other Health Measures for Covid-19 PandemicAkhila Sri Manasa Venigalla, Dheeraj Vagavolu, Sridhar Chimalakonda
Covid-19 has been causing severe loss to the human race. Considering the mode of spread and severity, it is essential to make it a habit to follow various safety precautions such as using sanitizers and masks and maintaining social distancing to prevent the spread of Covid-19. Individuals are widely educated about the safety measures against the disease through various modes such as announcements through online or physical awareness campaigns, advertisements in the media and so on. The younger generations today spend considerably more time on mobile phones and games. However, there are very few applications or games aimed to help in practicing safety measures against a pandemic, which is much lesser in the case of Covid-19. Hence, we propose a 2D survival-based game, SurviveCovid-19, aimed to educate people about safety precautions to be taken for Covid-19 outside their homes by incorporating social distancing and usage of masks and sanitizers in the game. SurviveCovid-19 has been designed as an Android-based mobile game, along with a desktop (browser) version, and has been evaluated through a remote quantitative user survey, with 30 volunteers using the questionnaire based on the MEEGA+ model. The survey results are promising, with all the survey questions having a mean value greater than 3.5. The game's quality factor was 69.3, indicating that the game could be classified as excellent quality, according to the MEEGA+ model.
SEFeb 13, 2020
An Exploratory Study of Code Smells in Web GamesVartika Agrahari, Sridhar Chimalakonda
With the continuous growth of the internet market, games are becoming more and more popular worldwide. However, increased market competition for game demands developers to write more efficient games in terms of performance, security, and maintenance. The continuous evolution of software systems and its increasing complexity may result in bad design decisions. Researchers analyzed the cognitive, behavioral and social effects of games. Also, gameplay and game mechanics have been a research area to enhance game playing, but to the extent of our knowledge, there hardly exists any research work that studies the bad coding practices in game development. Hence, through our study, we try to analyze and identify the presence of bad coding practices called code smells that may cause quality issues in games. To accomplish this, we created a dataset of 361 web games written in JavaScript. On this dataset, we run a JavaScript code smell detection tool JSNose to find the occurrence and distribution of code smell in web games. Further, we did a manual study on 9 web games to find violation of existing game programming patterns. Our results show that existing tools are mostly language-specific and are not enough in the context of games as they were not able to detect the anti-patterns or bad coding practices that are game-specific, motivating the need of game-specific code smell detection tools.
SEMay 11, 2019
GE852: A Dataset of 852 Game EnginesChaitanya S. Lakkundi, Vartika Agrahari, Sridhar Chimalakonda
Game engines provide a platform for developers to build games with an interface tailored to handle the complexity during game development. To reduce effort and improve quality of game development, there is a strong need to understand and analyze the quality of game engines and their various aspects such as API usability, code quality, code reuse and so on. To the best our knowledge, we are not aware of any dataset that caters to game engines in the literature. To this end, we present GE852, a dataset of 852 game engine repositories mined from GitHub in two languages, namely Java and C++. The dataset contains metadata of all the mined repositories including commits, pull requests, issues and so on. We believe that our dataset can lay foundation for empirical investigation in the area of game engines.
SEFeb 14, 2018
A Family of Software Product Lines in Educational TechnologiesSridhar Chimalakonda, Kesav V. Nori
Rapid advances in education domain demand the design and customization of educational technologies for a large scale and variety of evolving requirements. Here, scale is the number of systems to be developed and variety stems from a diversified range of instructional designs such as varied goals, processes, content, teacher styles, learner styles and, also for eLearning Systems for 22 Indian Languages and variants. In this paper, we present a family of software product lines as an approach to address this challenge of modeling a family of instructional designs as well as a family of eLearning Systems and demonstrate it for the case of adult literacy in India (287 million learners). We present a multi-level product line that connects product lines at multiple levels of granularity in education domain. We then detail two concrete product lines (http://rice.iiit.ac.in), one that generates instructional design editors and two, which generates a family of eLearning Systems based on flexible instructional designs. Finally, we demonstrate our approach by generating eLearning Systems for Hindi and Telugu languages (both web and android versions), which led to significant cost savings of 29 person months for 9 eLearning Systems.
CYFeb 7, 2018
An Ontology Based Modeling Framework for Design of Educational TechnologiesSridhar Chimalakonda, Kesav V. Nori
Despite rapid progress, most of the educational technologies today lack a strong instructional design knowledge basis leading to questionable quality of instruction. In addition, a major challenge is to customize these educational technologies for a wide range of instructional designs. Ontologies are one of the pertinent mechanisms to represent instructional design in the literature. However, existing approaches do not support modeling of flexible instructional designs. To address this problem, in this paper, we propose an ontology based framework for systematic modeling of different aspects of instructional design knowledge based on domain patterns. As part of the framework, we present ontologies for modeling goals, instructional processes and instructional materials. We demonstrate the ontology framework by presenting instances of the ontology for the large scale case study of adult literacy in India (287 million learners spread across 22 Indian Languages), which requires creation of 1000 similar but varied eLearning Systems based on flexible instructional designs. The implemented framework is available at http://rice.iiit.ac.in and is transferred to National Literacy Mission of Government of India. This framework could be used for modeling instructional design knowledge of systems for skills, school education and beyond.
SEFeb 7, 2018
A Patterns Based Approach for Design of Educational TechnologiesSridhar Chimalakonda, Kesav V. Nori
Instructional design is a fundamental base for educational technologies as it lays the foundation to facilitate learning and teaching based on pedagogical underpinnings. However, most of the educational technologies today face two core challenges in this context: (i) lack of instructional design as a basis (ii) lack of support for a variety of instructional designs. In order to address these challenges, we propose a patterns based approach for design of educational technologies. This is in contrast with existing literature that focuses either on patterns in education or in software, and not both. The core idea of our approach is to leverage patterns for modeling instructional design knowledge and to connect it with patterns in software architecture. We discuss different categories of patterns in instructional design. We then present the notion of Pattern-Oriented Instructional Design (POID) as a way to model instructional design as a connection of patterns (GoalPattern, ProcessPattern, ContentPattern) and integrate it with Pattern-Oriented Software Architecture (POSA) based on fundamental principles in software engineering. We demonstrate our approach through adult literacy case study (287 million learners, 22 Indian Languages and a variety of instructional designs). The results of our approach (both web and mobile versions) are available at http://rice.iiit.ac.in and were adopted by National Literacy Mission Authority of Government of India.