Amin Alipour

CY
h-index46
7papers
122citations
Novelty19%
AI Score32

7 Papers

HCApr 7
Trust in AI among Middle Eastern CS Students: Investigating Students' Trust and Usage Patterns Across Saudi Arabia, Kuwait and Jordan

Saleh Alkhamees, Ali Alfageeh, Bader Alkhazi et al.

Background and Context: Artificial intelligence (AI) tools have been reshaping computing and computer science education. Trust in AI is a determining factor in the adoption of these tools. Recent studies have shown different trust factors across gender and first-generation status among students. However, these studies have focused mainly on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations, and their generalizability to other populations with different languages and cultures is unclear. Objective: This study aims to evaluate trust in AI among Middle Eastern computer science students and the factors that can impact it. Method. We replicate a recent study of trust in four universities in three Middle Eastern, Arabic-speaking countries: Saudi Arabia, Kuwait, and Jordan. We analyze trust among students across different factors such as gender and first-generation status. Findings: Our results suggest that language fluency can predict trust in AI. Moreover, unlike the results from the US population where female students tended to trust AI more than their male peers, female students in Saudi Arabia indicated lower trust compared to their male counterparts, and we did not observe any noticeable differences across gender in the other countries. We also found a generally negative correlation between English language proficiency and students' confidence. Implications: This study highlights differences in students' adoption and trust in AI even within the same region. It emphasizes the need for more investigation into students' adoption and interaction in non-WEIRD regions for equitable adoption of this technology. It also suggests a need for efforts in designing effective AI systems tailored to the cultural and linguistic needs of the region.

SEFeb 3, 2024
Calibration and Correctness of Language Models for Code

Claudio Spiess, David Gros, Kunal Suresh Pai et al.

Machine learning models are widely used, but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so a rational decision can be made whether to use the output or not. For example, outputs can be associated with a confidence measure; if this confidence measure is strongly associated with likelihood of correctness, then the model is said to be well-calibrated. A well-calibrated confidence measure can serve as a basis for rational, graduated decision-making on how much review and care is needed when using generated code. Calibration has so far been studied in mostly non-generative (e.g. classification) settings, especially in software engineering. However, generated code can quite often be wrong: Given generated code, developers must decide whether to use directly, use after varying intensity of careful review, or discard model-generated code. Thus, calibration is vital in generative settings. We make several contributions. We develop a framework for evaluating the calibration of code-generating models. We consider several tasks, correctness criteria, datasets, and approaches, and find that, by and large, generative code models we test are not well-calibrated out of the box. We then show how calibration can be improved using standard methods, such as Platt scaling. Since Platt scaling relies on the prior availability of correctness data, we evaluate the applicability and generalizability of Platt scaling in software engineering, discuss settings where it has good potential for practical use, and settings where it does not. Our contributions will lead to better-calibrated decision-making in the current use of code generated by language models, and offers a framework for future research to further improve calibration methods for generative models in software engineering.

CYDec 17, 2024
Breaking the Programming Language Barrier: Multilingual Prompting to Empower Non-Native English Learners

James Prather, Brent N. Reeves, Paul Denny et al.

Non-native English speakers (NNES) face multiple barriers to learning programming. These barriers can be obvious, such as the fact that programming language syntax and instruction are often in English, or more subtle, such as being afraid to ask for help in a classroom full of native English speakers. However, these barriers are frustrating because many NNES students know more about programming than they can articulate in English. Advances in generative AI (GenAI) have the potential to break down these barriers because state of the art models can support interactions in multiple languages. Moreover, recent work has shown that GenAI can be highly accurate at code generation and explanation. In this paper, we provide the first exploration of NNES students prompting in their native languages (Arabic, Chinese, and Portuguese) to generate code to solve programming problems. Our results show that students are able to successfully use their native language to solve programming problems, but not without some difficulty specifying programming terminology and concepts. We discuss the challenges they faced, the implications for practice in the short term, and how this might transform computing education globally in the long term.

HCJan 21, 2025
To Google or To ChatGPT? A Comparison of CS2 Students' Information Gathering Approaches and Outcomes

Aayush Kumar, Daniel Prol, Amin Alipour et al.

LLMs such as ChatGPT have been widely adopted by students in higher education as tools for learning programming and related concepts. However, it remains unclear how effective students are and what strategies students use while learning with LLMs. Since the majority of students' experiences in online self-learning have come through using search engines such as Google, evaluating AI tools in this context can help us address these gaps. In this mixed methods research, we conducted an exploratory within-subjects study to understand how CS2 students learn programming concepts using both LLMs as well as traditional online methods such as educational websites and videos to examine how students approach learning within and across both scenarios. We discovered that students found it easier to learn a more difficult concept using traditional methods than using ChatGPT. We also found that students ask fewer follow-ups and use more keyword-based queries for search engines while their prompts to LLMs tend to explicitly ask for information.

SEMar 2, 2020
Examining user reviews of conversational systems: a case study of Alexa skills

Soodeh Atefi, Andrew Truelove, Matheus Rheinschmitt et al.

Conversational systems use spoken language to interact with their users. Although conversational systems, such as Amazon Alexa, are becoming common and afford interesting functionalities, there is little known about the issues users of these systems face. In this paper, we study user reviews of more than 2,800 Alexa skills to understand the characteristics of the reviews and issues that are raised in them. Our results suggest that most skills receive less than 50 reviews. Our qualitative study of user reviews using open coding resulted in identifying 16 types of issues in the user reviews. Issues related to the content, integration with online services and devices, error, and regression are top issues raised by the users. Our results also indicate differences in volume and types of complaints by users when compared with more traditional mobile applications. We discuss the implication of our results for practitioners and researchers.

CYMay 15, 2019
Smart Contract Development from the Perspective of Developers: Topics and Issues Discussed on Social Media

Afiya Ayman, Shanto Roy, Amin Alipour et al.

Blockchain-based platforms are emerging as a transformative technology that can provide reliability, integrity, and auditability without trusted entities. One of the key features of these platforms is the trustworthy decentralized execution of general-purpose computation in the form of smart contracts, which are envisioned to have a wide range of applications. As a result, a rapidly growing and active community of smart-contract developers has emerged in recent years. A number of research efforts have investigated the technological challenges that these developers face, introducing a variety of tools, languages, and frameworks for smart-contract development, focusing on security. However, relatively little is known about the community itself, about the developers, and about the issues that they face and discuss. To address this gap, we study smart-contract developers and their discussions on two social media sites, Stack Exchange and Medium. We provide insight into the trends and key topics of these discussions, into the developers' interest in various security issues and security tools, and into the developers' technological background.

CLMay 3, 2019
Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models

Amirreza Shirani, Bowen Xu, David Lo et al.

Domain-specific community question answering is becoming an integral part of professions. Finding related questions and answers in these communities can significantly improve the effectiveness and efficiency of information seeking. Stack Overflow is one of the most popular communities that is being used by millions of programmers. In this paper, we analyze the problem of predicting knowledge unit (question thread) relatedness in Stack Overflow. In particular, we formulate the question relatedness task as a multi-class classification problem with four degrees of relatedness. We present a large-scale dataset with more than 300K pairs. To the best of our knowledge, this dataset is the largest domain-specific dataset for Question-Question relatedness. We present the steps that we took to collect, clean, process, and assure the quality of the dataset. The proposed dataset Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the prediction of relatedness in technical community question-answering forums. We adopt a neural network architecture and a traditional model for this task that effectively utilize information from different parts of knowledge units to compute the relatedness between them. These models can be used to benchmark novel models, as they perform well in our task and in a closely similar task.