AIJan 7, 2023
Visual Story Generation Based on Emotion and KeywordsYuetian Chen, Ruohua Li, Bowen Shi et al.
Automated visual story generation aims to produce stories with corresponding illustrations that exhibit coherence, progression, and adherence to characters' emotional development. This work proposes a story generation pipeline to co-create visual stories with the users. The pipeline allows the user to control events and emotions on the generated content. The pipeline includes two parts: narrative and image generation. For narrative generation, the system generates the next sentence using user-specified keywords and emotion labels. For image generation, diffusion models are used to create a visually appealing image corresponding to each generated sentence. Further, object recognition is applied to the generated images to allow objects in these images to be mentioned in future story development.
ROJul 15, 2021Code
Learning Sparse Interaction Graphs of Partially Detected Pedestrians for Trajectory PredictionZhe Huang, Ruohua Li, Kazuki Shin et al.
Multi-pedestrian trajectory prediction is an indispensable element of autonomous systems that safely interact with crowds in unstructured environments. Many recent efforts in trajectory prediction algorithms have focused on understanding social norms behind pedestrian motions. Yet we observe these works usually hold two assumptions, which prevent them from being smoothly applied to robot applications: (1) positions of all pedestrians are consistently tracked, and (2) the target agent pays attention to all pedestrians in the scene. The first assumption leads to biased interaction modeling with incomplete pedestrian data. The second assumption introduces aggregation of redundant surrounding information, and the target agent may be affected by unimportant neighbors or present overly conservative motion. Thus, we propose Gumbel Social Transformer, in which an Edge Gumbel Selector samples a sparse interaction graph of partially detected pedestrians at each time step. A Node Transformer Encoder and a Masked LSTM encode pedestrian features with sampled sparse graphs to predict trajectories. We demonstrate that our model overcomes potential problems caused by the aforementioned assumptions, and our approach outperforms related works in trajectory prediction benchmarks. Code is available at \url{https://github.com/tedhuang96/gst}.
ROJun 30, 2020Code
Long-term Pedestrian Trajectory Prediction using Mutable Intention Filter and Warp LSTMZhe Huang, Aamir Hasan, Kazuki Shin et al.
Trajectory prediction is one of the key capabilities for robots to safely navigate and interact with pedestrians. Critical insights from human intention and behavioral patterns need to be integrated to effectively forecast long-term pedestrian behavior. Thus, we propose a framework incorporating a Mutable Intention Filter and a Warp LSTM (MIF-WLSTM) to simultaneously estimate human intention and perform trajectory prediction. The Mutable Intention Filter is inspired by particle filtering and genetic algorithms, where particles represent intention hypotheses that can be mutated throughout the pedestrian motion. Instead of predicting sequential displacement over time, our Warp LSTM learns to generate offsets on a full trajectory predicted by a nominal intention-aware linear model, which considers the intention hypotheses during filtering process. Through experiments on a publicly available dataset, we show that our method outperforms baseline approaches and demonstrate the robust performance of our method under abnormal intention-changing scenarios. Code is available at https://github.com/tedhuang96/mifwlstm.