Sanem Sariel

CV
h-index24
5papers
55citations
Novelty48%
AI Score32

5 Papers

CVOct 25, 2021Code
A Variational Graph Autoencoder for Manipulation Action Recognition and Prediction

Gamze Akyol, Sanem Sariel, Eren Erdal Aksoy

Despite decades of research, understanding human manipulation activities is, and has always been, one of the most attractive and challenging research topics in computer vision and robotics. Recognition and prediction of observed human manipulation actions have their roots in the applications related to, for instance, human-robot interaction and robot learning from demonstration. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. These networks, however, come with immense computational complexity to be able to process high dimensional raw data. Different from the related works, we here introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs, instead of relying on the structured Euclidean data. Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs. The input of the proposed network is a set of semantic graphs which store the spatial relations between subjects and objects in the scene. The network output is a label set representing the detected and predicted class types. We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance. We also release our source code https://github.com/gamzeakyol/GNet.

RONov 11, 2020Code
FINO-Net: A Deep Multimodal Sensor Fusion Framework for Manipulation Failure Detection

Arda Inceoglu, Eren Erdal Aksoy, Abdullah Cihan Ak et al.

Safe manipulation in unstructured environments for service robots is a challenging problem. A failure detection system is needed to monitor and detect unintended outcomes. We propose FINO-Net, a novel multimodal sensor fusion based deep neural network to detect and identify manipulation failures. We also introduce a multimodal dataset, containing 229 real-world manipulation data recorded with a Baxter robot. Our network combines RGB, depth and audio readings to effectively detect and classify failures. Results indicate that fusing RGB with depth and audio modalities significantly improves the performance. FINO-Net achieves 98.60% detection and 87.31% classification accuracy on our novel dataset. Code and data are publicly available at https://github.com/ardai/fino-net.

CVMar 15, 2025
Real-Time Manipulation Action Recognition with a Factorized Graph Sequence Encoder

Enes Erdogan, Eren Erdal Aksoy, Sanem Sariel

Recognition of human manipulation actions in real-time is essential for safe and effective human-robot interaction and collaboration. The challenge lies in developing a model that is both lightweight enough for real-time execution and capable of generalization. While some existing methods in the literature can run in real-time, they struggle with temporal scalability, i.e., they fail to adapt to long-duration manipulations effectively. To address this, leveraging the generalizable scene graph representations, we propose a new Factorized Graph Sequence Encoder network that not only runs in real-time but also scales effectively in the temporal dimension, thanks to its factorized encoder architecture. Additionally, we introduce Hand Pooling operation, a simple pooling operation for more focused extraction of the graph-level embeddings. Our model outperforms the previous state-of-the-art real-time approach, achieving a 14.3\% and 5.6\% improvement in F1-macro score on the KIT Bimanual Action (Bimacs) Dataset and Collaborative Action (CoAx) Dataset, respectively. Moreover, we conduct an extensive ablation study to validate our network design choices. Finally, we compare our model with its architecturally similar RGB-based model on the Bimacs dataset and show the limitations of this model in contrast to ours on such an object-centric manipulation dataset.

AIApr 13, 2021
Two-stage training algorithm for AI robot soccer

Taeyoung Kim, Luiz Felipe Vecchietti, Kyujin Choi et al.

In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior, however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, e.g., team rewards. Because these two training processes are conducted in a series in every timestep, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other approaches that can be used to solve problems of training cooperative multi-agent.

ROJan 24, 2020
What went wrong?: Identification of Everyday Object Manipulation Anomalies

Dogan Altan, Sanem Sariel

Extending the abilities of service robots is important for expanding what they can achieve in everyday manipulation tasks. On the other hand, it is also essential to ensure them to determine what they can not achieve in certain cases due to either anomalies or permanent failures during task execution. Robots need to identify these situations, and reveal the reasons behind these cases to overcome and recover from them. In this paper, we propose and analyze a Long Short-Term Memories-based (LSTM-based) awareness approach to reveal the reasons behind an anomaly case that occurs during a manipulation episode in an unstructured environment. The proposed method takes into account the real-time observations of the robot by fusing visual, auditory and proprioceptive sensory modalities to achieve this task. We also provide a comparative analysis of our method with Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs). The symptoms of anomalies are first learned from a given training set, then they can be classified in real-time based on the learned models. The approaches are evaluated on a Baxter robot executing object manipulation scenarios. The results indicate that the LSTM-based method outperforms the other methods with a 0.94 classification rate in revealing causes of anomalies in case of an unexpected deviation.