Ahmed Abbasi

CL
h-index78
25papers
242citations
Novelty43%
AI Score53

25 Papers

CLSep 25, 2023
When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs

Marialena Bevilacqua, Kezia Oketch, Ruiyang Qin et al.

The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating large-language models such as generative pre-trained transformers (GPTs). We empirically assess the differences in how ML-based scoring models trained on human content assess the quality of content generated by humans versus GPTs. To do so, we propose an analysis framework that encompasses essay scoring ML-models, human and ML-generated essays, and a statistical model that parsimoniously considers the impact of type of respondent, prompt genre, and the ML model used for assessment model. A rich testbed is utilized that encompasses 18,460 human-generated and GPT-based essays. Results of our benchmark analysis reveal that transformer pretrained language models (PLMs) more accurately score human essay quality as compared to CNN/RNN and feature-based ML methods. Interestingly, we find that the transformer PLMs tend to score GPT-generated text 10-15\% higher on average, relative to human-authored documents. Conversely, traditional deep learning and feature-based ML models score human text considerably higher. Further analysis reveals that although the transformer PLMs are exclusively fine-tuned on human text, they more prominently attend to certain tokens appearing only in GPT-generated text, possibly due to familiarity/overlap in pre-training. Our framework and results have implications for text classification settings where automated scoring of text is likely to be disrupted by generative AI.

CLNov 17, 2023
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads

Yi Yang, Hanyu Duan, Ahmed Abbasi et al.

Transformer-based pretrained large language models (PLM) such as BERT and GPT have achieved remarkable success in NLP tasks. However, PLMs are prone to encoding stereotypical biases. Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown. Understanding the internal stereotyping mechanisms may allow better assessment of model fairness and guide the development of effective mitigation strategies. In this work, we focus on attention heads, a major component of the Transformer architecture, and propose a bias analysis framework to explore and identify a small set of biased heads that are found to contribute to a PLM's stereotypical bias. We conduct extensive experiments to validate the existence of these biased heads and to better understand how they behave. We investigate gender and racial bias in the English language in two types of Transformer-based PLMs: the encoder-based BERT model and the decoder-based autoregressive GPT model. Overall, the results shed light on understanding the bias behavior in pretrained language models.

CLNov 17, 2023
Exploring the Relationship between In-Context Learning and Instruction Tuning

Hanyu Duan, Yixuan Tang, Yi Yang et al.

In-Context Learning (ICL) and Instruction Tuning (IT) are two primary paradigms of adopting Large Language Models (LLMs) to downstream applications. However, they are significantly different. In ICL, a set of demonstrations are provided at inference time but the LLM's parameters are not updated. In IT, a set of demonstrations are used to tune LLM's parameters in training time but no demonstrations are used at inference time. Although a growing body of literature has explored ICL and IT, studies on these topics have largely been conducted in isolation, leading to a disconnect between these two paradigms. In this work, we explore the relationship between ICL and IT by examining how the hidden states of LLMs change in these two paradigms. Through carefully designed experiments conducted with LLaMA-2 (7B and 13B), we find that ICL is implicit IT. In other words, ICL changes an LLM's hidden states as if the demonstrations were used to instructionally tune the model. Furthermore, the convergence between ICL and IT is largely contingent upon several factors related to the provided demonstrations. Overall, this work offers a unique perspective to explore the connection between ICL and IT and sheds light on understanding the behaviors of LLM.

CLNov 21, 2023
Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

Ruiyang Qin, Jun Xia, Zhenge Jia et al.

After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.

SPNov 14, 2022
Alternating minimization algorithm with initialization analysis for r-local and k-sparse unlabeled sensing

Ahmed Abbasi, Shuchin Aeron, Abiy Tasissa

Unlabeled sensing is a linear inverse problem with permuted measurements. We propose an alternating minimization (AltMin) algorithm with a suitable initialization for two widely considered permutation models: partially shuffled/$k$-sparse permutations and $r$-local/block diagonal permutations. Key to the performance of the AltMin algorithm is the initialization. For the exact unlabeled sensing problem, assuming either a Gaussian measurement matrix or a sub-Gaussian signal, we bound the initialization error in terms of the number of blocks $s$ and the number of shuffles $k$. Experimental results show that our algorithm is fast, applicable to both permutation models, and robust to choice of measurement matrix. We also test our algorithm on several real datasets for the linked linear regression problem and show superior performance compared to baseline methods.

CLMar 14, 2025Code
Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring

Kezia Oketch, John P. Lalor, Yi Yang et al.

Closed large language models (LLMs) such as GPT-4 have set state-of-the-art results across a number of NLP tasks and have become central to NLP and machine learning (ML)-driven solutions. Closed LLMs' performance and wide adoption has sparked considerable debate about their accessibility in terms of availability, cost, and transparency. In this study, we perform a rigorous comparative analysis of nine leading LLMs, spanning closed, open, and open-source LLM ecosystems, across text assessment and generation tasks related to automated essay scoring. Our findings reveal that for few-shot learning-based assessment of human generated essays, open LLMs such as Llama 3 and Qwen2.5 perform comparably to GPT-4 in terms of predictive performance, with no significant differences in disparate impact scores when considering age- or race-related fairness. Moreover, Llama 3 offers a substantial cost advantage, being up to 37 times more cost-efficient than GPT-4. For generative tasks, we find that essays generated by top open LLMs are comparable to closed LLMs in terms of their semantic composition/embeddings and ML assessed scores. Our findings challenge the dominance of closed LLMs and highlight the democratizing potential of open LLMs, suggesting they can effectively bridge accessibility divides while maintaining competitive performance and fairness.

CLFeb 5
Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks

Guangwei Zhang, Jianing Zhu, Cheng Qian et al.

We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall testing, paraphrase-level similarity analysis, persuasive jailbreak probing, and unlearning verification, within a unified and extensible framework. Through interactive prompting, response collection, and iterative workflows, our system enables systematic auditing of verbatim memorization and paraphrase-level leakage, supporting responsible deployment and transparent evaluation of LLM copyright risks even with black-box access.

CYNov 2, 2023
Data Science for Social Good

Ahmed Abbasi, Roger H. L. Chiang, Jennifer J. Xu

Data science has been described as the fourth paradigm for scientific discovery. The latest wave of data science research, pertaining to machine learning and artificial intelligence (AI), is growing exponentially and garnering millions of annual citations. However, this growth has been accompanied by a diminishing emphasis on social good challenges - our analysis reveals that the proportion of data science research focusing on social good is less than it has ever been. At the same time, the proliferation of machine learning and generative AI have sparked debates about the socio-technical prospects and challenges associated with data science for human flourishing, organizations, and society. Against this backdrop, we present a framework for "data science for social good" (DSSG) research that considers the interplay between relevant data science research genres, social good challenges, and different levels of socio-technical abstraction. We perform an analysis of the literature to empirically demonstrate the paucity of work on DSSG in information systems (and other related disciplines) and highlight current impediments. We then use our proposed framework to introduce the articles appearing in the special issue. We hope that this article and the special issue will spur future DSSG research and help reverse the alarming trend across data science research over the past 30-plus years in which social good challenges are garnering proportionately less attention with each passing day.

HCMay 4
Augmenting Interface Usability Heuristics for Reliable Computer-Use Agents

Jiateng Liu, Rushi Wang, Bingxuan Li et al.

Recent advances have enabled general computer-use agents that interpret screens and execute grounded actions from human instructions, yet they still struggle to generalize to unseen and evolving interfaces. While improving agent capability remains important, agent compatible interface design offers a complementary path by aligning interaction semantics with agent prior knowledge. In this paper, we revisit Nielsen 10 usability heuristics through the lens of computer-use agents, identifying which principles naturally transfer, where implicit design assumptions create agent specific failures, and how safe additive augmentations can improve robustness without harming human usability. To evaluate these ideas, we introduce UI-Verse, a suite of controlled environments built around functionally similar interfaces with different applied heuristics. Experiments show that our augmented heuristics consistently improve task completion and modestly improve efficiency, with combined heuristics yielding further gains. Human studies further show that these designs preserve the original interaction workflow without observable usability regressions. Overall, our findings highlight interface design as a practical complementary avenue for improving the reliability and generalization of computer use agents.

LGFeb 9, 2024
FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via Large Language Models

Ruiyang Qin, Yuting Hu, Zheyu Yan et al.

Neural Architecture Search (NAS) has become the de fecto tools in the industry in automating the design of deep neural networks for various applications, especially those driven by mobile and edge devices with limited computing resources. The emerging large language models (LLMs), due to their prowess, have also been incorporated into NAS recently and show some promising results. This paper conducts further exploration in this direction by considering three important design metrics simultaneously, i.e., model accuracy, fairness, and hardware deployment efficiency. We propose a novel LLM-based NAS framework, FL-NAS, in this paper, and show experimentally that FL-NAS can indeed find high-performing DNNs, beating state-of-the-art DNN models by orders-of-magnitude across almost all design considerations.

CLSep 23, 2025
LLMs4All: A Systematic Review of Large Language Models Across Academic Disciplines

Yanfang Ye, Zheyuan Zhang, Tianyi Ma et al.

Cutting-edge Artificial Intelligence (AI) techniques keep reshaping our view of the world. For example, Large Language Models (LLMs) based applications such as ChatGPT have shown the capability of generating human-like conversation on extensive topics. Due to the impressive performance on a variety of language-related tasks (e.g., open-domain question answering, translation, and document summarization), one can envision the far-reaching impacts that can be brought by the LLMs with broader real-world applications (e.g., customer service, education and accessibility, and scientific discovery). Inspired by their success, this paper will offer an overview of state-of-the-art LLMs and their integration into a wide range of academic disciplines, including: (1) arts, letters, and law (e.g., history, philosophy, political science, arts and architecture, law), (2) economics and business (e.g., finance, economics, accounting, marketing), and (3) science and engineering (e.g., mathematics, physics and mechanical engineering, chemistry and chemical engineering, life sciences and bioengineering, earth sciences and civil engineering, computer science and electrical engineering). Integrating humanity and technology, in this paper, we will explore how LLMs are shaping research and practice in these fields, while also discussing key limitations, open challenges, and future directions in the era of generative AI. The review of how LLMs are engaged across disciplines-along with key observations and insights-can help researchers and practitioners interested in exploiting LLMs to advance their works in diverse real-world applications.

LGMar 5, 2025
Predicting Practically? Domain Generalization for Predictive Analytics in Real-world Environments

Hanyu Duan, Yi Yang, Ahmed Abbasi et al.

Predictive machine learning models are widely used in customer relationship management (CRM) to forecast customer behaviors and support decision-making. However, the dynamic nature of customer behaviors often results in significant distribution shifts between training data and serving data, leading to performance degradation in predictive models. Domain generalization, which aims to train models that can generalize to unseen environments without prior knowledge of their distributions, has become a critical area of research. In this work, we propose a novel domain generalization method tailored to handle complex distribution shifts, encompassing both covariate and concept shifts. Our method builds upon the Distributionally Robust Optimization framework, optimizing model performance over a set of hypothetical worst-case distributions rather than relying solely on the training data. Through simulation experiments, we demonstrate the working mechanism of the proposed method. We also conduct experiments on a real-world customer churn dataset, and validate its effectiveness in both temporal and spatial generalization settings. Finally, we discuss the broader implications of our method for advancing Information Systems (IS) design research, particularly in building robust predictive models for dynamic managerial environments.

CLSep 2, 2025
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts

Rushi Wang, Jiateng Liu, Cheng Qian et al.

Incorporating external context can significantly enhance the response quality of Large Language Models (LLMs). However, real-world contexts often mix relevant information with disproportionate inappropriate content, posing reliability risks. How do LLMs process and prioritize mixed context? To study this, we introduce the Poisoned Context Testbed, pairing queries with real-world contexts containing relevant and inappropriate content. Inspired by associative learning in animals, we adapt the Rescorla-Wagner (RW) model from neuroscience to quantify how competing contextual signals influence LLM outputs. Our adapted model reveals a consistent behavioral pattern: LLMs exhibit a strong tendency to incorporate information that is less prevalent in the context. This susceptibility is harmful in real-world settings, where small amounts of inappropriate content can substantially degrade response quality. Empirical evaluations on our testbed further confirm this vulnerability. To tackle this, we introduce RW-Steering, a two-stage finetuning-based approach that enables the model to internally identify and ignore inappropriate signals. Unlike prior methods that rely on extensive supervision across diverse context mixtures, RW-Steering generalizes robustly across varying proportions of inappropriate content. Experiments show that our best fine-tuned model improves response quality by 39.8% and reverses the undesirable behavior curve, establishing RW-Steering as a robust, generalizable context engineering solution for improving LLM safety in real-world use.

CLSep 9, 2025
PersonaFuse: A Personality Activation-Driven Framework for Enhancing Human-LLM Interactions

Yixuan Tang, Yi Yang, Ahmed Abbasi

Recent advancements in Large Language Models (LLMs) demonstrate remarkable capabilities across various fields. These developments have led to more direct communication between humans and LLMs in various situations, such as social companionship and psychological support. However, LLMs often exhibit limitations in emotional perception and social competence during real-world conversations. These limitations partly originate from their inability to adapt their communication style and emotional expression to different social and task contexts. In this work, we introduce PersonaFuse, a novel LLM post-training framework that enables LLMs to adapt and express different personalities for varying situations. Inspired by Trait Activation Theory and the Big Five personality model, PersonaFuse employs a Mixture-of-Expert architecture that combines persona adapters with a dynamic routing network, enabling contextual trait expression. Experimental results show that PersonaFuse substantially outperforms baseline models across multiple dimensions of social-emotional intelligence. Importantly, these gains are achieved without sacrificing general reasoning ability or model safety, which remain common limitations of direct prompting and supervised fine-tuning approaches. PersonaFuse also delivers consistent improvements in downstream human-centered applications, such as mental health counseling and review-based customer service. Finally, human preference evaluations against leading LLMs, including GPT-4o and DeepSeek, demonstrate that PersonaFuse achieves competitive response quality despite its comparatively smaller model size. These findings demonstrate that PersonaFuse offers a theoretically grounded and practical approach for developing social-emotional enhanced LLMs, marking a significant advancement toward more human-centric AI systems.

CLAug 6, 2025
Benchmarking Sociolinguistic Diversity in Swahili NLP: A Taxonomy-Guided Approach

Kezia Oketch, John P. Lalor, Ahmed Abbasi

We introduce the first taxonomy-guided evaluation of Swahili NLP, addressing gaps in sociolinguistic diversity. Drawing on health-related psychometric tasks, we collect a dataset of 2,170 free-text responses from Kenyan speakers. The data exhibits tribal influences, urban vernacular, code-mixing, and loanwords. We develop a structured taxonomy and use it as a lens for examining model prediction errors across pre-trained and instruction-tuned language models. Our findings advance culturally grounded evaluation frameworks and highlight the role of sociolinguistic variation in shaping model performance.

CLJul 30, 2025
PersonaTwin: A Multi-Tier Prompt Conditioning Framework for Generating and Evaluating Personalized Digital Twins

Sihan Chen, John P. Lalor, Yi Yang et al.

While large language models (LLMs) afford new possibilities for user modeling and approximation of human behaviors, they often fail to capture the multidimensional nuances of individual users. In this work, we introduce PersonaTwin, a multi-tier prompt conditioning framework that builds adaptive digital twins by integrating demographic, behavioral, and psychometric data. Using a comprehensive data set in the healthcare context of more than 8,500 individuals, we systematically benchmark PersonaTwin against standard LLM outputs, and our rigorous evaluation unites state-of-the-art text similarity metrics with dedicated demographic parity assessments, ensuring that generated responses remain accurate and unbiased. Experimental results show that our framework produces simulation fidelity on par with oracle settings. Moreover, downstream models trained on persona-twins approximate models trained on individuals in terms of prediction and fairness metrics across both GPT-4o-based and Llama-based models. Together, these findings underscore the potential for LLM digital twin-based approaches in producing realistic and emotionally nuanced user simulations, offering a powerful tool for personalized digital user modeling and behavior analysis.

LGFeb 4, 2025
Aligning Human and Machine Attention for Enhanced Supervised Learning

Avihay Chriqui, Inbal Yahav, Dov Teeni et al.

Attention, or prioritization of certain information items over others, is a critical element of any learning process, for both humans and machines. Given that humans continue to outperform machines in certain learning tasks, it seems plausible that machine performance could be enriched by aligning machine attention with human attention mechanisms -- yet research on this topic is sparse and has achieved only limited success. This paper proposes a new approach to address this gap, called Human-Machine Attention Learning (HuMAL). This approach involves reliance on data annotated by humans to reflect their self-perceived attention during specific tasks. We evaluate several alternative strategies for integrating such human attention data into machine learning (ML) algorithms, using a sentiment analysis task (review data from Yelp) and a personality-type classification task (data from myPersonality). The best-performing HuMAL strategy significantly enhances the task performance of fine-tuned transformer models (BERT, as well as GPT-2 and XLNET), and the benefit is particularly pronounced under challenging conditions of imbalanced or sparse labeled data. This research contributes to a deeper understanding of strategies for integrating human attention into ML models and highlights the potential of leveraging human cognition to augment ML in real-world applications.

LGJun 6, 2024
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Ruiyang Qin, Dancheng Liu, Chenhui Xu et al.

The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become more and more prevalent. An urging but open question is how a resource-constrained computing environment would affect the design choices for a personalized LLM. We study this problem empirically in this work. In particular, we consider the tradeoffs among a number of key design factors and their intertwined impacts on learning efficiency and accuracy. The factors include the learning methods for LLM customization, the amount of personalized data used for learning customization, the types and sizes of LLMs, the compression methods of LLMs, the amount of time afforded to learn, and the difficulty levels of the target use cases. Through extensive experimentation and benchmarking, we draw a number of surprisingly insightful guidelines for deploying LLMs onto resource-constrained devices. For example, an optimal choice between parameter learning and RAG may vary depending on the difficulty of the downstream task, the longer fine-tuning time does not necessarily help the model, and a compressed LLM may be a better choice than an uncompressed LLM to learn from limited personalized data.

CLJul 25, 2020
Constructing a Testbed for Psychometric Natural Language Processing

Ahmed Abbasi, David G. Dobolyi, Richard G. Netemeyer

Psychometric measures of ability, attitudes, perceptions, and beliefs are crucial for understanding user behaviors in various contexts including health, security, e-commerce, and finance. Traditionally, psychometric dimensions have been measured and collected using survey-based methods. Inferring such constructs from user-generated text could afford opportunities for timely, unobtrusive, collection and analysis. In this paper, we describe our efforts to construct a corpus for psychometric natural language processing (NLP). We discuss our multi-step process to align user text with their survey-based response items and provide an overview of the resulting testbed which encompasses survey-based psychometric measures and accompanying user-generated text from over 8,500 respondents. We report preliminary results on the use of the text to categorize/predict users' survey response labels. We also discuss the important implications of our work and resulting testbed for future psychometric NLP research.

SPNov 14, 2019
R-local unlabeled sensing: A novel graph matching approach for multiview unlabeled sensing under local permutations

Ahmed Abbasi, Abiy Tasissa, Shuchin Aeron

Unlabeled sensing is a linear inverse problem where the measurements are scrambled under an unknown permutation leading to loss of correspondence between the measurements and the rows of the sensing matrix. Motivated by practical tasks such as mobile sensor networks, target tracking and the pose and correspondence estimation between point clouds, we study a special case of this problem restricting the class of permutations to be local and allowing for multiple views. In this setting, namely unlabeled multi-view sensing with local permutation, previous results and algorithms are not directly applicable. In this paper, we propose a computationally efficient algorithm that creatively exploits the machinery of graph alignment and Gromov-Wasserstein alignment and leverages the multiple views to estimate the local permutations. Simulation results on synthetic data sets indicate that the proposed algorithm is scalable and applicable to the challenging regimes of low to moderate SNR.

IRSep 27, 2013
A Random Walk Model for Item Recommendation in Folksonomies

Zhu Zhang, Daniel Zeng, Ahmed Abbasi et al.

Social tagging, as a novel approach to information organization and discovery, has been widely adopted in many Web2.0 applications. The tags provide a new type of information that can be exploited by recommender systems. Nevertheless, the sparsity of ternary <user, tag, item> interaction data limits the performance of tag-based collaborative filtering. This paper proposes a random-walk-based algorithm to deal with the sparsity problem in social tagging data, which captures the potential transitive associations between users and items through their interaction with tags. In particular, two smoothing strategies are presented from both the user-centric and item-centric perspectives. Experiments on real-world data sets empirically demonstrate the efficacy of the proposed algorithm.

IRSep 27, 2013
Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Tianjun Fu, Ahmed Abbasi, Daniel Zeng et al.

Despite the prevalence of sentiment-related content on the Web, there has been limited work on focused crawlers capable of effectively collecting such content. In this study, we evaluated the efficacy of using sentiment-related information for enhanced focused crawling of opinion-rich web content regarding a particular topic. We also assessed the impact of using sentiment-labeled web graphs to further improve collection accuracy. Experimental results on a large test bed encompassing over half a million web pages revealed that focused crawlers utilizing sentiment information as well as sentiment-labeled web graphs are capable of gathering more holistic collections of opinion-related content regarding a particular topic. The results have important implications for business and marketing intelligence gathering efforts in the Web 2.0 era.

CYSep 27, 2013
Evaluating Link-Based Techniques for Detecting Fake Pharmacy Websites

Ahmed Abbasi, Siddharth Kaza, F. Mariam Zahedi

Fake online pharmacies have become increasingly pervasive, constituting over 90% of online pharmacy websites. There is a need for fake website detection techniques capable of identifying fake online pharmacy websites with a high degree of accuracy. In this study, we compared several well-known link-based detection techniques on a large-scale test bed with the hyperlink graph encompassing over 80 million links between 15.5 million web pages, including 1.2 million known legitimate and fake pharmacy pages. We found that the QoC and QoL class propagation algorithms achieved an accuracy of over 90% on our dataset. The results revealed that algorithms that incorporate dual class propagation as well as inlink and outlink information, on page-level or site-level graphs, are better suited for detecting fake pharmacy websites. In addition, site-level analysis yielded significantly better results than page-level analysis for most algorithms evaluated.

CYSep 27, 2013
A Statistical Learning Based System for Fake Website Detection

Ahmed Abbasi, Zhu Zhang, Hsinchun Chen

Existing fake website detection systems are unable to effectively detect fake websites. In this study, we advocate the development of fake website detection systems that employ classification methods grounded in statistical learning theory (SLT). Experimental results reveal that a prototype system developed using SLT-based methods outperforms seven existing fake website detection systems on a test bed encompassing 900 real and fake websites.

CYSep 27, 2013
Detecting Fake Escrow Websites using Rich Fraud Cues and Kernel Based Methods

Ahmed Abbasi, Hsinchun Chen

The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, fake escrow website categorization has received little attention. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of features extracted from web page text, image, and link information. We also proposed a composite kernel tailored to represent the properties of fake websites, including content duplication and structural attributes. Experiments were conducted to assess the proposed features, techniques, and kernels on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow sites. The combination of an extended feature set and the composite kernel attained over 98% accuracy when differentiating fake sites from real ones, using the support vector machines algorithm. The results suggest that automated web-based information systems for detecting fake escrow sites could be feasible and may be utilized as authentication mechanisms.