Gustavo Voltani von Atzingen

AI
3papers
8citations
Novelty18%
AI Score31

3 Papers

8.7AIJun 4
GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

Paulo Ricardo Ferreira Neves, Edson Rodrigues da Cruz Filho, Paulo Henrique Eleuterio Falsetti et al.

Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark evaluations may be affected by contamination and partial information leakage, compromising performance estimates. This work presents GuardNet, a guardrail system based on an ensemble of shallow neural networks (BiLSTMs) with approximately 47 million parameters. We investigate the hypothesis that robustness in adversarial scenarios depends more on the diversity of example coverage and threshold calibration than on model scale. The results indicate that GuardNet achieves competitive performance compared with lightweight detectors and high efficiency at low latency, although larger LLMs such as Mistral-7B and Llama-3.1-8B still achieve superior performance in terms of F1 score and AUROC on the blind JBB-Behaviors benchmark. Nevertheless, GuardNet achieves an AUROC of 0.747 on the blind dataset (n = 200) and an F1 score of 0.92 on a proprietary benchmark (n = 50), under threshold calibration and evaluation with declared partial information leakage. The system operates with an average latency of approximately 50 ms on CPU, making it suitable for deployment in production environments with cost and infrastructure constraints.

CLOct 25, 2023
Detection of news written by the ChatGPT through authorship attribution performed by a Bidirectional LSTM model

Amanda Ferrari Iaquinta, Gustavo Voltani von Atzingen

The large language based-model chatbot ChatGPT gained a lot of popularity since its launch and has been used in a wide range of situations. This research centers around a particular situation, when the ChatGPT is used to produce news that will be consumed by the population, causing the facilitation in the production of fake news, spread of misinformation and lack of trust in news sources. Aware of these problems, this research aims to build an artificial intelligence model capable of performing authorship attribution on news articles, identifying the ones written by the ChatGPT. To achieve this goal, a dataset containing equal amounts of human and ChatGPT written news was assembled and different natural processing language techniques were used to extract features from it that were used to train, validate and test three models built with different techniques. The best performance was produced by the Bidirectional Long Short Term Memory (LSTM) Neural Network model, achiving 91.57\% accuracy when tested against the data from the testing set.

NCJul 29, 2021
EEG multipurpose eye blink detector using convolutional neural network

Amanda Ferrari Iaquinta, Ana Carolina de Sousa Silva, Aldrumont Ferraz Júnior et al.

The electrical signal emitted by the eyes movement produces a very strong artifact on EEG signaldue to its close proximity to the sensors and abundance of occurrence. In the context of detectingeye blink artifacts in EEG waveforms for further removal and signal purification, multiple strategieswhere proposed in the literature. Most commonly applied methods require the use of a large numberof electrodes, complex equipment for sampling and processing data. The goal of this work is to createa reliable and user independent algorithm for detecting and removing eye blink in EEG signals usingCNN (convolutional neural network). For training and validation, three sets of public EEG data wereused. All three sets contain samples obtained while the recruited subjects performed assigned tasksthat included blink voluntarily in specific moments, watch a video and read an article. The modelused in this study was able to have an embracing understanding of all the features that distinguish atrivial EEG signal from a signal contaminated with eye blink artifacts without being overfitted byspecific features that only occurred in the situations when the signals were registered.