CRMay 28
Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model UsageShahinul Hoque, Jinghuai Zhang, Jinyuan Sun et al.
Per-token billing is now the standard pricing model for commercial large language models (LLMs), so the honesty of reported token counts directly affects what users pay. We show that this kind of billing is hard to audit by design: providers hide the model, the tokenizer, and the execution to protect their IP, mitigate jailbreaks, and preserve user privacy, which means an auditor can only inspect proofs the provider supplies. The audit therefore reduces to a consistency check on the provider's own reports. We call this a trust paradox: every audit must trust some artifact, but current frameworks trust exactly the ones a provider has the strongest reason to manipulate. We study three recent token auditing frameworks and show that a provider with ordinary commercial capabilities can systematically inflate billed token counts. In the most permissive setting, hidden reasoning usage can be inflated by 1,469% on average without detection. At current frontier reasoning prices, that turns a \$100 honest bill into roughly a \$1,569 bill on the same query. Even when the user can see the full reasoning string, tokenization ambiguity alone still allows 50.85% over-reporting below the detection threshold. These results suggest the problem is not in any specific auditor but in any audit whose evidence comes from the audited party. Restoring honest billing will require verification that ties reported token counts to evidence the provider does not control, such as trusted execution attestation, cryptographic proofs of inference, or third-party re-execution.
CVNov 17, 2025
Accuracy is Not Enough: Poisoning Interpretability in Federated Learning via Color SkewFarhin Farhad Riya, Shahinul Hoque, Jinyuan Stella Sun et al.
As machine learning models are increasingly deployed in safety-critical domains, visual explanation techniques have become essential tools for supporting transparency. In this work, we reveal a new class of attacks that compromise model interpretability without affecting accuracy. Specifically, we show that small color perturbations applied by adversarial clients in a federated learning setting can shift a model's saliency maps away from semantically meaningful regions while keeping the prediction unchanged. The proposed saliency-aware attack framework, called Chromatic Perturbation Module, systematically crafts adversarial examples by altering the color contrast between foreground and background in a way that disrupts explanation fidelity. These perturbations accumulate across training rounds, poisoning the global model's internal feature attributions in a stealthy and persistent manner. Our findings challenge a common assumption in model auditing that correct predictions imply faithful explanations and demonstrate that interpretability itself can be an attack surface. We evaluate this vulnerability across multiple datasets and show that standard training pipelines are insufficient to detect or mitigate explanation degradation, especially in the federated learning setting, where subtle color perturbations are harder to discern. Our attack reduces peak activation overlap in Grad-CAM explanations by up to 35% while preserving classification accuracy above 96% on all evaluated datasets.
CVMay 9, 2023
Effects of Real-Life Traffic Sign Alteration on YOLOv7- an Object Recognition ModelFarhin Farhad Riya, Shahinul Hoque, Md Saif Hassan Onim et al.
The widespread adoption of Image Processing has propelled Object Recognition (OR) models into essential roles across various applications, demonstrating the power of AI and enabling crucial services. Among the applications, traffic sign recognition stands out as a popular research topic, given its critical significance in the development of autonomous vehicles. Despite their significance, real-world challenges, such as alterations to traffic signs, can negatively impact the performance of OR models. This study investigates the influence of altered traffic signs on the accuracy and effectiveness of object recognition, employing a publicly available dataset to introduce alterations in shape, color, content, visibility, angles and background. Focusing on the YOLOv7 (You Only Look Once) model, the study demonstrates a notable decline in detection and classification accuracy when confronted with traffic signs in unusual conditions including the altered traffic signs. Notably, the alterations explored in this study are benign examples and do not involve algorithms used for generating adversarial machine learning samples. This study highlights the significance of enhancing the robustness of object detection models in real-life scenarios and the need for further investigation in this area to improve their accuracy and reliability.