48.9CVMar 23
Efficient Zero-Shot AI-Generated Image DetectionRyosuke Sonoda, Ramya Srinivasan
The rapid progress of text-to-image models has made AI-generated images increasingly realistic, posing significant challenges for accurate detection of generated content. While training-based detectors often suffer from limited generalization to unseen images, training-free approaches offer better robustness, yet struggle to capture subtle discrepancies between real and synthetic images. In this work, we propose a training-free AI-generated image detection method that measures representation sensitivity to structured frequency perturbations, enabling detection of minute manipulations. The proposed method is computationally lightweight, as perturbation generation requires only a single Fourier transform for an input image. As a result, it achieves one to two orders of magnitude faster inference than most training-free detectors.Extensive experiments on challenging benchmarks demonstrate the efficacy of our method over state-of-the-art (SoTA). In particular, on OpenFake benchmark, our method improves AUC by nearly $10\%$ compared to SoTA, while maintaining substantially lower computational cost.
AIDec 13, 2023
Human-in-the-loop Fairness: Integrating Stakeholder Feedback to Incorporate Fairness Perspectives in Responsible AIEvdoxia Taka, Yuri Nakao, Ryosuke Sonoda et al.
Fairness is a growing concern for high-risk decision-making using Artificial Intelligence (AI) but ensuring it through purely technical means is challenging: there is no universally accepted fairness measure, fairness is context-dependent, and there might be conflicting perspectives on what is considered fair. Thus, involving stakeholders, often without a background in AI or fairness, is a promising avenue. Research to directly involve stakeholders is in its infancy, and many questions remain on how to support stakeholders to feedback on fairness, and how this feedback can be integrated into AI models. Our work follows an approach where stakeholders can give feedback on specific decision instances and their outcomes with respect to their fairness, and then to retrain an AI model. In order to investigate this approach, we conducted two studies of a complex AI model for credit rating used in loan applications. In study 1, we collected feedback from 58 lay users on loan application decisions, and conducted offline experiments to investigate the effects on accuracy and fairness metrics. In study 2, we deepened this investigation by showing 66 participants the results of their feedback with respect to fairness, and then conducted further offline analyses. Our work contributes two datasets and associated code frameworks to bootstrap further research, highlights the opportunities and challenges of employing lay user feedback for improving AI fairness, and discusses practical implications for developing AI applications that more closely reflect stakeholder views about fairness.
CLOct 22, 2024
A Statistical Analysis of LLMs' Self-Evaluation Using ProverbsRyosuke Sonoda, Ramya Srinivasan
Large language models (LLMs) such as ChatGPT, GPT-4, Claude-3, and Llama are being integrated across a variety of industries. Despite this rapid proliferation, experts are calling for caution in the interpretation and adoption of LLMs, owing to numerous associated ethical concerns. Research has also uncovered shortcomings in LLMs' reasoning and logical abilities, raising questions on the potential of LLMs as evaluation tools. In this paper, we investigate LLMs' self-evaluation capabilities on a novel proverb reasoning task. We introduce a novel proverb database consisting of 300 proverb pairs that are similar in intent but different in wordings, across topics spanning gender, wisdom, and society. We propose tests to evaluate textual consistencies as well as numerical consistencies across similar proverbs, and demonstrate the effectiveness of our method and dataset in identifying failures in LLMs' self-evaluation which in turn can highlight issues related to gender stereotypes and lack of cultural understanding in LLMs.
CVOct 20, 2025
Fair and Interpretable Deepfake Detection in VideosAkihito Yoshii, Ryosuke Sonoda, Ramya Srinivasan
Existing deepfake detection methods often exhibit bias, lack transparency, and fail to capture temporal information, leading to biased decisions and unreliable results across different demographic groups. In this paper, we propose a fairness-aware deepfake detection framework that integrates temporal feature learning and demographic-aware data augmentation to enhance fairness and interpretability. Our method leverages sequence-based clustering for temporal modeling of deepfake videos and concept extraction to improve detection reliability while also facilitating interpretable decisions for non-expert users. Additionally, we introduce a demography-aware data augmentation method that balances underrepresented groups and applies frequency-domain transformations to preserve deepfake artifacts, thereby mitigating bias and improving generalization. Extensive experiments on FaceForensics++, DFD, Celeb-DF, and DFDC datasets using state-of-the-art (SoTA) architectures (Xception, ResNet) demonstrate the efficacy of the proposed method in obtaining the best tradeoff between fairness and accuracy when compared to SoTA.
LGMay 23, 2023
Fair Oversampling Technique using Heterogeneous ClustersRyosuke Sonoda
Class imbalance and group (e.g., race, gender, and age) imbalance are acknowledged as two reasons in data that hinder the trade-off between fairness and utility of machine learning classifiers. Existing techniques have jointly addressed issues regarding class imbalance and group imbalance by proposing fair over-sampling techniques. Unlike the common oversampling techniques, which only address class imbalance, fair oversampling techniques significantly improve the abovementioned trade-off, as they can also address group imbalance. However, if the size of the original clusters is too small, these techniques may cause classifier overfitting. To address this problem, we herein develop a fair oversampling technique using data from heterogeneous clusters. The proposed technique generates synthetic data that have class-mix features or group-mix features to make classifiers robust to overfitting. Moreover, we develop an interpolation method that can enhance the validity of generated synthetic data by considering the original cluster distribution and data noise. Finally, we conduct experiments on five realistic datasets and three classifiers, and the experimental results demonstrate the effectiveness of the proposed technique in terms of fairness and utility.
LGOct 29, 2021
A Pre-processing Method for Fairness in RankingRyosuke Sonoda
Fair ranking problems arise in many decision-making processes that often necessitate a trade-off between accuracy and fairness. Many existing studies have proposed correction methods such as adding fairness constraints to a ranking model's loss. However, the challenge of correcting the data bias for fair ranking remains, and the trade-off of the ranking models leaves room for improvement. In this paper, we propose a fair ranking framework that evaluates the order of training data in a pairwise manner as well as various fairness measurements in ranking. This study is the first proposal of a pre-processing method that solves fair ranking problems using the pairwise ordering method with our best knowledge. The fair pairwise ordering method is prominent in training the fair ranking models because it ensures that the resulting ranking likely becomes parity across groups. As far as the fairness measurements in ranking are represented as a linear constraint of the ranking models, we proved that the minimization of loss function subject to the constraints is reduced to the closed solution of the minimization problem augmented by weights to training data. This closed solution inspires us to present a practical and stable algorithm that iterates the optimization of weights and model parameters. The empirical results over real-world datasets demonstrated that our method outperforms the existing methods in the trade-off between accuracy and fairness over real-world datasets and various fairness measurements.