LGAug 21, 2023Code
Extreme Multilabel Classification for Specialist Doctor Recommendation with Implicit Feedback and Limited Patient MetadataFilipa Valdeira, Stevo Racković, Valeria Danalachi et al.
Recommendation Systems (RS) are often used to address the issue of medical doctor referrals. However, these systems require access to patient feedback and medical records, which may not always be available in real-world scenarios. Our research focuses on medical referrals and aims to predict recommendations in different specialties of physicians for both new patients and those with a consultation history. We use Extreme Multilabel Classification (XML), commonly employed in text-based classification tasks, to encode available features and explore different scenarios. While its potential for recommendation tasks has often been suggested, this has not been thoroughly explored in the literature. Motivated by the doctor referral case, we show how to recast a traditional recommender setting into a multilabel classification problem that current XML methods can solve. Further, we propose a unified model leveraging patient history across different specialties. Compared to state-of-the-art RS using the same features, our approach consistently improves standard recommendation metrics up to approximately $10\%$ for patients with a previous consultation history. For new patients, XML proves better at exploiting available features, outperforming the benchmark in favorable scenarios, with particular emphasis on recall metrics. Thus, our approach brings us one step closer to creating more effective and personalized doctor referral systems. Additionally, it highlights XML as a promising alternative to current hybrid or content-based RS, while identifying key aspects to take into account when using XML for recommendation tasks.
CVMar 26, 2022
Probabilistic Registration for Gaussian Process 3D shape modelling in the presence of extensive missing dataFilipa Valdeira, Ricardo Ferreira, Alessandra Micheletti et al.
We propose a shape fitting/registration method based on a Gaussian Processes formulation, suitable for shapes with extensive regions of missing data. Gaussian Processes are a proven powerful tool, as they provide a unified setting for shape modelling and fitting. While the existing methods in this area prove to work well for the general case of the human head, when looking at more detailed and deformed data, with a high prevalence of missing data, such as the ears, the results are not satisfactory. In order to overcome this, we formulate the shape fitting problem as a multi-annotator Gaussian Process Regression and establish a parallel with the standard probabilistic registration. The achieved method SFGP shows better performance when dealing with extensive areas of missing data when compared to a state-of-the-art registration method and current approaches for registration with pre-existing shape models. Experiments are conducted both for a 2D small dataset with diverse transformations and a 3D dataset of ears.
OCJul 21, 2024
Generalizing Trilateration: Approximate Maximum Likelihood Estimator for Initial Orbit Determination in Low-Earth OrbitRicardo Ferreira, Filipa Valdeira, Marta Guimarães et al.
With the increase in the number of active satellites and space debris in orbit, the problem of initial orbit determination (IOD) becomes increasingly important, demanding a high accuracy. Over the years, different approaches have been presented such as filtering methods (for example, Extended Kalman Filter), differential algebra or solving Lambert's problem. In this work, we consider a setting of three monostatic radars, where all available measurements are taken approximately at the same instant. This follows a similar setting as trilateration, a state-of-the-art approach, where each radar is able to obtain a single measurement of range and range-rate. Differently, and due to advances in Multiple-Input Multiple-Output (MIMO) radars, we assume that each location is able to obtain a larger set of range, angle and Doppler shift measurements. Thus, our method can be understood as an extension of trilateration leveraging more recent technology and incorporating additional data. We formulate the problem as a Maximum Likelihood Estimator (MLE), which for some number of observations is asymptotically unbiased and asymptotically efficient. Through numerical experiments, we demonstrate that our method attains the same accuracy as the trilateration method for the same number of measurements and offers an alternative and generalization, returning a more accurate estimation of the satellite's state vector, as the number of available measurements increases.
CLMar 21
Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports?Sofia Morgado, Filipa Valdeira, Niklas Sander et al.
Coronary angiography (CAG) reports contain clinically relevant physiological measurements, yet this information is typically in the form of unstructured natural language, limiting its use in research. We investigate the use of Large Language Models (LLMs) to automatically extract these values, along with their anatomical locations, from Portuguese CAG reports. To our knowledge, this study is the first addressing physiology indexes extraction from a large (1342 reports) corpus of CAG reports, and one of the few focusing on CAG or Portuguese clinical text. We explore local privacy-preserving general-purpose and medical LLMs under different settings. Prompting strategies included zero-shot, few-shot, and few-shot prompting with implausible examples. In addition, we apply constrained generation and introduce a post-processing step based on RegEx. Given the sparsity of measurements, we propose a multi-stage evaluation framework separating format validity, value detection, and value correctness, while accounting for asymmetric clinical error costs. This study demonstrates the potential of LLMs in for extracting physiological indices from Portuguese CAG reports. Non-medical models performed similarly, the best results were obtained with Llama with a zero-shot prompting, while GPT-OSS demonstrated the highest robustness to changes in the prompts. While MedGemma demonstrated similar results to non-medical models, MedLlama's results were out-of-format in the unconstrained setting, and had a significant lower performance in the constrained one. Changes in the prompt techinique and adding a RegEx layer showed no significant improvement across models, while using constrained generation decreased performance, although having the benefit of allowing the usage of specific models that are not able to conform with the templates.
SYDec 20, 2023
One-Shot Initial Orbit Determination in Low-Earth OrbitRicardo Ferreira, Marta Guimarães, Filipa Valdeira et al.
Due to the importance of satellites for society and the exponential increase in the number of objects in orbit, it is important to accurately determine the state (e.g., position and velocity) of these Resident Space Objects (RSOs) at any time and in a timely manner. State-of-the-art methodologies for initial orbit determination consist of Kalman-type filters that process sequential data over time and return the state and associated uncertainty of the object, as is the case of the Extended Kalman Filter (EKF). However, these methodologies are dependent on a good initial guess for the state vector and usually simplify the physical dynamical model, due to the difficulty of precisely modeling perturbative forces, such as atmospheric drag and solar radiation pressure. Other approaches do not require assumptions about the dynamical system, such as the trilateration method, and require simultaneous measurements, such as three measurements of range and range-rate for the particular case of trilateration. We consider the same setting of simultaneous measurements (one-shot), resorting to time delay and Doppler shift measurements. Based on recent advancements in the problem of moving target localization for sonar multistatic systems, we are able to formulate the problem of initial orbit determination as a Weighted Least Squares. With this approach, we are able to directly obtain the state of the object (position and velocity) and the associated covariance matrix from the Fisher's Information Matrix (FIM). We demonstrate that, for small noise, our estimator is able to attain the Cramér-Rao Lower Bound accuracy, i.e., the accuracy attained by the unbiased estimator with minimum variance. We also numerically demonstrate that our estimator is able to attain better accuracy on the state estimation than the trilateration method and returns a smaller uncertainty associated with the estimation.
LGFeb 3, 2022
Ranking with Confidence for Large Scale Comparison DataFilipa Valdeira, Cláudia Soares
In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking algorithm from pairwise comparisons that produces a measure of confidence on each comparison. The problem of ranking a large number of items from noisy and sparse pairwise comparison data arises in diverse applications, like ranking players in online games, document retrieval or ranking human perceptions. Although different algorithms are available, we need fast, large-scale algorithms whose accuracy degrades gracefully when the number of comparisons is too small. Fitting our proposed model entails solving a non-convex optimization problem, which we tightly approximate by a sum of quasi-convex functions and a regularization term. Resorting to an iterative reweighted minimization and the Primal-Dual Hybrid Gradient method, we obtain PD-Rank, achieving a Kendall tau 0.1 higher than all comparing methods, even for 10\% of wrong comparisons in simulated data matching our data model, and leading in accuracy if data is generated according to the Bradley-Terry model, in both cases faster by one order of magnitude, in seconds. In real data, PD-Rank requires less computational time to achieve the same Kendall tau than active learning methods.
CVAug 22, 2020
From noisy point clouds to complete ear shapes: unsupervised pipelineFilipa Valdeira, Ricardo Ferreira, Alessandra Micheletti et al.
Ears are a particularly difficult region of the human face to model, not only due to the non-rigid deformations existing between shapes but also to the challenges in processing the retrieved data. The first step towards obtaining a good model is to have complete scans in correspondence, but these usually present a higher amount of occlusions, noise and outliers when compared to most face regions, thus requiring a specific procedure. Therefore, we propose a complete pipeline taking as input unordered 3D point clouds with the aforementioned problems, and producing as output a dataset in correspondence, with completion of the missing data. We provide a comparison of several state-of-the-art registration methods and propose a new approach for one of the steps of the pipeline, with better performance for our data.