LGAug 6, 2024
RHiOTS: A Framework for Evaluating Hierarchical Time Series Forecasting AlgorithmsLuis Roque, Carlos Soares, Luís Torgo
We introduce the Robustness of Hierarchically Organized Time Series (RHiOTS) framework, designed to assess the robustness of hierarchical time series forecasting models and algorithms on real-world datasets. Hierarchical time series, where lower-level forecasts must sum to upper-level ones, are prevalent in various contexts, such as retail sales across countries. Current empirical evaluations of forecasting methods are often limited to a small set of benchmark datasets, offering a narrow view of algorithm behavior. RHiOTS addresses this gap by systematically altering existing datasets and modifying the characteristics of individual series and their interrelations. It uses a set of parameterizable transformations to simulate those changes in the data distribution. Additionally, RHiOTS incorporates an innovative visualization component, turning complex, multidimensional robustness evaluation results into intuitive, easily interpretable visuals. This approach allows an in-depth analysis of algorithm and model behavior under diverse conditions. We illustrate the use of RHiOTS by analyzing the predictive performance of several algorithms. Our findings show that traditional statistical methods are more robust than state-of-the-art deep learning algorithms, except when the transformation effect is highly disruptive. Furthermore, we found no significant differences in the robustness of the algorithms when applying specific reconciliation methods, such as MinT. RHiOTS provides researchers with a comprehensive tool for understanding the nuanced behavior of forecasting algorithms, offering a more reliable basis for selecting the most appropriate method for a given problem.
IRDec 19, 2016
Data-Driven Relevance Judgments for Ranking EvaluationNuno Moniz, Luís Torgo, João Vinagre
Ranking evaluation metrics are a fundamental element of design and improvement efforts in information retrieval. We observe that most popular metrics disregard information portrayed in the scores used to derive rankings, when available. This may pose a numerical scaling problem, causing an under- or over-estimation of the evaluation depending on the degree of divergence between the scores of ranked items. The purpose of this work is to propose a principled way of quantifying multi-graded relevance judgments of items and enable a more accurate penalization of ordering errors in rankings. We propose a data-driven generation of relevance functions based on the degree of the divergence amongst a set of items' scores and its application in the evaluation metric Normalized Discounted Cumulative Gain (nDCG). We use synthetic data to demonstrate the interest of our proposal and a combination of data on news items from Google News and their respective popularity in Twitter to show its performance in comparison to the standard nDCG. Results show that our proposal is capable of providing a more fine-grained evaluation of rankings when compared to the standard nDCG, and that the latter frequently under- or over-estimates its evaluation scores in light of the divergence of items' scores.
IRJun 4, 2015
Socially Driven News RecommendationNuno Moniz, Luís Torgo, Magdalini Eirinaki
The participatory Web has enabled the ubiquitous and pervasive access of information, accompanied by an increase of speed and reach in information sharing. Data dissemination services such as news aggregators are expected to provide up-to-date, real-time information to the end users. News aggregators are in essence recommendation systems that filter and rank news stories in order to select the few that will appear on the users front screen at any time. One of the main challenges in such systems is to address the recency and latency problems, that is, to identify as soon as possible how important a news story is. In this work we propose an integrated framework that aims at predicting the importance of news items upon their publication with a focus on recent and highly popular news, employing resampling strategies, and at translating the result into concrete news rankings. We perform an extensive experimental evaluation using real-life datasets of the proposed framework as both a stand-alone system and when applied to news recommendations from Google News. Additionally, we propose and evaluate a combinatorial solution to the augmentation of official media recommendations with social information. Results show that the proposed approach complements and enhances the news rankings generated by state-of-the-art systems.