CLApr 23
Misinformation Span Detection in Videos via Audio TranscriptsBreno Matos, Rennan C. Lima, Savvas Zannettou et al.
Online misinformation is one of the most challenging issues lately, yielding severe consequences, including political polarization, attacks on democracy, and public health risks. Misinformation manifests in any platform with a large user base, including online social networks and messaging apps. It permeates all media and content forms, including images, text, audio, and video. Distinctly, video-based misinformation represents a multifaceted challenge for fact-checkers, given the ease with which individuals can record and upload videos on various video-sharing platforms. Previous research efforts investigated detecting video-based misinformation, focusing on whether a video shares misinformation or not on a video level. While this approach is useful, it only provides a limited and non-easily interpretable view of the problem given that it does not provide an additional context of when misinformation occurs within videos and what content (i.e., claims) are responsible for the video's misinformation nature. In this work, we attempt to bridge this research gap by creating two novel datasets that allow us to explore misinformation detection on videos via audio transcripts, focusing on identifying the span of videos that are responsible for the video's misinformation claim (misinformation span detection). We present two new datasets for this task. We transcribe each video's audio to text, identifying the video segment in which the misinformation claims appears, resulting in two datasets of more than 500 videos with over 2,400 segments containing annotated fact-checked claims. Then, we employ classifiers built with state-of-the-art language models, and our results show that we can identify in which part of a video there is misinformation with an F1 score of 0.68. We make publicly available our annotated datasets. We also release all transcripts, audio and videos.
IRAug 7, 2025
On the Reliability of Sampling Strategies in Offline Recommender EvaluationBruno L. Pereira, Alan Said, Rodrygo L. T. Santos
Offline evaluation plays a central role in benchmarking recommender systems when online testing is impractical or risky. However, it is susceptible to two key sources of bias: exposure bias, where users only interact with items they are shown, and sampling bias, introduced when evaluation is performed on a subset of logged items rather than the full catalog. While prior work has proposed methods to mitigate sampling bias, these are typically assessed on fixed logged datasets rather than for their ability to support reliable model comparisons under varying exposure conditions or relative to true user preferences. In this paper, we investigate how different combinations of logging and sampling choices affect the reliability of offline evaluation. Using a fully observed dataset as ground truth, we systematically simulate diverse exposure biases and assess the reliability of common sampling strategies along four dimensions: sampling resolution (recommender model separability), fidelity (agreement with full evaluation), robustness (stability under exposure bias), and predictive power (alignment with ground truth). Our findings highlight when and how sampling distorts evaluation outcomes and offer practical guidance for selecting strategies that yield faithful and robust offline comparisons.
LGMar 29, 2025
Towards Symmetric Low-Rank AdaptersTales Panoutsos, Rodrygo L. T. Santos, Flavio Figueiredo
In this paper, we introduce Symmetric Low-Rank Adapters, an optimized variant of LoRA with even fewer weights. This method utilizes Low-Rank Symmetric Weight Matrices to learn downstream tasks more efficiently. Traditional LoRA accumulates fine-tuning weights with the original pre-trained weights via a Singular Value Decomposition (SVD) like approach, i.e., model weights are fine-tuned via updates of the form $BA$ (where $B \in \mathbb{R}^{n\times r}$, $A \in \mathbb{R}^{r\times n}$, and $r$ is the rank of the merged weight matrix). In contrast, our approach, named SymLoRA, represents fine-tuning weights as a Spectral Decomposition, i.e., $Q \, diag(Λ)\, Q^T$, where $Q \in \mathbb{R}^{n\times r}$ and $Λ\in \mathbb{R}^r$. SymLoRA requires approximately half of the finetuning weights. Here, we show that this approach has negligible losses in downstream efficacy.
IRMar 4, 2016
Simplified Relative Citation Ratio for Static Paper Ranking: UFMG/LATIN at WSDM Cup 2016Sabir Ribas, Alberto Ueda, Rodrygo L. T. Santos et al.
Static rankings of papers play a key role in the academic search setting. Many features are commonly used in the literature to produce such rankings, some examples are citation-based metrics, distinct applications of PageRank, among others. More recently, learning to rank techniques have been successfully applied to combine sets of features producing effective results. In this work, we propose the metric S-RCR, which is a simplified version of a metric called Relative Citation Ratio --- both based on the idea of a co-citation network. When compared to the classical version, our simplification S-RCR leads to improved efficiency with a reasonable effectiveness. We use S-RCR to rank over 120 million papers in the Microsoft Academic Graph dataset. By using this single feature, which has no parameters and does not need to be tuned, our team was able to reach the 3rd position in the first phase of the WSDM Cup 2016.