Andrea Bottino

CV
5papers
167citations
Novelty30%
AI Score24

5 Papers

CVSep 21, 2024
Egocentric zone-aware action recognition across environments

Simone Alberto Peirone, Gabriele Goletto, Mirco Planamente et al.

Human activities exhibit a strong correlation between actions and the places where these are performed, such as washing something at a sink. More specifically, in daily living environments we may identify particular locations, hereinafter named activity-centric zones, which may afford a set of homogeneous actions. Their knowledge can serve as a prior to favor vision models to recognize human activities. However, the appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains. This problem is particularly relevant in egocentric vision, where the environment takes up most of the image, making it even more difficult to separate the action from the context. In this paper, we discuss the importance of decoupling the domain-specific appearance of activity-centric zones from their universal, domain-agnostic representations, and show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models. We validate our solution on the EPIC-Kitchens-100 and Argo1M datasets

AISep 3, 2021
A brief history of AI: how to prevent another winter (a critical review)

Amirhosein Toosi, Andrea Bottino, Babak Saboury et al.

The field of artificial intelligence (AI), regarded as one of the most enigmatic areas of science, has witnessed exponential growth in the past decade including a remarkably wide array of applications, having already impacted our everyday lives. Advances in computing power and the design of sophisticated AI algorithms have enabled computers to outperform humans in a variety of tasks, especially in the areas of computer vision and speech recognition. Yet, AI's path has never been smooth, having essentially fallen apart twice in its lifetime ('winters' of AI), both after periods of popular success ('summers' of AI). We provide a brief rundown of AI's evolution over the course of decades, highlighting its crucial moments and major turning points from inception to the present. In doing so, we attempt to learn, anticipate the future, and discuss what steps may be taken to prevent another 'winter'.

HCApr 1, 2021
Training Medical Communication Skills with Virtual Patients: Literature Review and Directions for Future Research

Edoardo Battegazzorre, Andrea Bottino, Fabrizio Lamberti

Effective communication is a crucial skill for healthcare providers since it leads to better patient health, satisfaction and avoids malpractice claims. In standard medical education, students' communication skills are trained with role-playing and Standardized Patients (SPs), i.e., actors. However, SPs are difficult to standardize, and are very resource consuming. Virtual Patients (VPs) are interactive computer-based systems that represent a valuable alternative to SPs. VPs are capable of portraying patients in realistic clinical scenarios and engage learners in realistic conversations. Approaching medical communication skill training with VPs has been an active research area in the last ten years. As a result, the number of works in this field has grown significantly. The objective of this work is to survey the recent literature, assessing the state of the art of this technology with a specific focus on the instructional and technical design of VP simulations. After having classified and analysed the VPs selected for our research, we identified several areas that require further investigation, and we drafted practical recommendations for VP developers on design aspects that, based on our findings, are pivotal to create novel and effective VP simulations or improve existing ones.

CVMar 23, 2021
DA4Event: towards bridging the Sim-to-Real Gap for Event Cameras using Domain Adaptation

Mirco Planamente, Chiara Plizzari, Marco Cannici et al.

Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of "events". The innovative way they acquire data presents several advantages over standard devices, especially in poor lighting and high-speed motion conditions. However, the novelty of these sensors results in the lack of a large amount of training data capable of fully unlocking their potential. The most common approach implemented by researchers to address this issue is to leverage simulated event data. Yet, this approach comes with an open research question: how well simulated data generalize to real data? To answer this, we propose to exploit, in the event-based context, recent Domain Adaptation (DA) advances in traditional computer vision, showing that DA techniques applied to event data help reduce the sim-to-real gap. To this purpose, we propose a novel architecture, which we call Multi-View DA4E (MV-DA4E), that better exploits the peculiarities of frame-based event representations while also promoting domain invariant characteristics in features. Through extensive experiments, we prove the effectiveness of DA methods and MV-DA4E on N-Caltech101. Moreover, we validate their soundness in a real-world scenario through a cross-domain analysis on the popular RGB-D Object Dataset (ROD), which we extended to the event modality (RGB-E).

CVFeb 10, 2020
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Mirco Planamente, Andrea Bottino, Barbara Caputo

Wearable cameras are becoming more and more popular in several applications, increasing the interest of the research community in developing approaches for recognizing actions from the first-person point of view. An open challenge in egocentric action recognition is that videos lack detailed information about the main actor's pose and thus tend to record only parts of the movement when focusing on manipulation tasks. Thus, the amount of information about the action itself is limited, making crucial the understanding of the manipulated objects and their context. Many previous works addressed this issue with two-stream architectures, where one stream is dedicated to modeling the appearance of objects involved in the action, and another to extracting motion features from optical flow. In this paper, we argue that learning features jointly from these two information channels is beneficial to capture the spatio-temporal correlations between the two better. To this end, we propose a single stream architecture able to do so, thanks to the addition of a self-supervised block that uses a pretext motion prediction task to intertwine motion and appearance knowledge. Experiments on several publicly available databases show the power of our approach.