AIApr 15
FieldWorkArena: Agentic AI Benchmark for Real Field Work TasksJun Takahashi, Atsunori Moteki, Akiyoshi Uchida et al. · cmu
This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are built to detect and document safety hazards, procedural violations, and other critical incidents across real-world manufacturing and retail environments. Whereas most agentic AI benchmarks focus on performance in simulated or digital environments, our work addresses the fundamental challenge of evaluating agents in the real-world. In this paper, we improve the evaluation function from previous methods to assess the performance of agentic AI in diverse real-world tasks. Our dataset comprises on-site captured images/videos in factories, warehouses and retails. Tasks were meticulously developed through interviews with site workers and managers. Evaluation results confirmed that performance evaluation considering the characteristics of Multimodal LLM (MLLM) such as GPT-4o is feasible. Furthermore, this study identifies both the effectiveness and limitations of the proposed new evaluation methodology. The complete dataset and evaluation program are publicly accessible on the website (https://en-documents.research.global.fujitsu.com/fieldworkarena/)
ROAug 2, 2024
Structure from Motion-based Motion Estimation and 3D Reconstruction of Unknown Shaped Space DebrisKentaro Uno, Takehiro Matsuoka, Akiyoshi Uchida et al.
With the boost in the number of spacecraft launches in the current decades, the space debris problem is daily becoming significantly crucial. For sustainable space utilization, the continuous removal of space debris is the most severe problem for humanity. To maximize the reliability of the debris capture mission in orbit, accurate motion estimation of the target is essential. Space debris has lost its attitude and orbit control capabilities, and its shape is unknown due to the break. This paper proposes the Structure from Motion-based algorithm to perform unknown shaped space debris motion estimation with limited resources, where only 2D images are required as input. The method then outputs the reconstructed shape of the unknown object and the relative pose trajectory between the target and the camera simultaneously, which are exploited to estimate the target's motion. The method is quantitatively validated with the realistic image dataset generated by the microgravity experiment in a 2D air-floating testbed and 3D kinematic simulation.
CVJul 7, 2021
Action Units Recognition Using Improved Pairwise Deep ArchitectureJunya Saito, Xiaoyu Mi, Akiyoshi Uchida et al.
Facial Action Units (AUs) represent a set of facial muscular activities and various combinations of AUs can represent a wide range of emotions. AU recognition is often used in many applications, including marketing, healthcare, education, and so forth. Although a lot of studies have developed various methods to improve recognition accuracy, it still remains a major challenge for AU recognition. In the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition, we proposed a new automatic Action Units (AUs) recognition method using a pairwise deep architecture to derive the Pseudo-Intensities of each AU and then convert them into predicted intensities. This year, we introduced a new technique to last year's framework to further reduce AU recognition errors due to temporary face occlusion such as hands on face or large face orientation. We obtained a score of 0.65 in the validation data set for this year's competition.
CVJul 7, 2021
Multi-modal Affect Analysis using standardized data within subjects in the WildSachihiro Youoku, Takahisa Yamamoto, Junya Saito et al.
Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest. When annotating facial expressions from a video, we thought that it would be judged not only from the features common to all people, but also from the relative changes in the time series of individuals. Therefore, after learning the common features for each frame, we constructed a facial expression estimation model and valence-arousal model using time-series data after combining the common features and the standardized features for each video. Furthermore, the above features were learned using multi-modal data such as image features, AU, Head pose, and Gaze. In the validation set, our model achieved a facial expression score of 0.546. These verification results reveal that our proposed framework can improve estimation accuracy and robustness effectively.
CVOct 1, 2020
Action Units Recognition by Pairwise Deep ArchitectureJunya Saito, Ryosuke Kawamura, Akiyoshi Uchida et al.
In this paper, we propose a new automatic Action Units (AUs) recognition method used in a competition, Affective Behavior Analysis in-the-wild (ABAW). Our method tackles a problem of AUs label inconsistency among subjects by using pairwise deep architecture. While the baseline score is 0.31, our method achieved 0.67 in validation dataset of the competition.