SE AI CLOct 15, 2025

TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models

Ruoyu Sun, Da Song, Jiayang Song, Yuheng Huang, Lei Ma

arXiv:2510.13106v13.4h-index: 11ASE

Originality Synthesis-oriented

AI Analysis

This work addresses trustworthiness concerns for users of LLMs in NLP applications, but it is incremental as it builds on existing perturbation methods and evaluation techniques.

The authors tackled the problem of evaluating trustworthiness in Large Language Models by introducing TRUSTVIS, an automated framework with an interactive interface that demonstrated effectiveness in identifying safety and robustness vulnerabilities in models like Vicuna-7b, Llama2-7b, and GPT-3.5.

As Large Language Models (LLMs) continue to revolutionize Natural Language Processing (NLP) applications, critical concerns about their trustworthiness persist, particularly in safety and robustness. To address these challenges, we introduce TRUSTVIS, an automated evaluation framework that provides a comprehensive assessment of LLM trustworthiness. A key feature of our framework is its interactive user interface, designed to offer intuitive visualizations of trustworthiness metrics. By integrating well-known perturbation methods like AutoDAN and employing majority voting across various evaluation methods, TRUSTVIS not only provides reliable results but also makes complex evaluation processes accessible to users. Preliminary case studies on models like Vicuna-7b, Llama2-7b, and GPT-3.5 demonstrate the effectiveness of our framework in identifying safety and robustness vulnerabilities, while the interactive interface allows users to explore results in detail, empowering targeted model improvements. Video Link: https://youtu.be/k1TrBqNVg8g

View on arXiv PDF

Similar