CVJul 24, 2019
Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational InteractantsMason Swofford, John Charles Peruzzi, Nathan Tsoi et al.
We propose a data-driven approach to detect conversational groups by identifying spatial arrangements typical of these focused social encounters. Our approach uses a novel Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group, considering their social context. The predicted pair-wise affinities are then used in a graph clustering framework to identify both small (e.g., dyads) and large groups. The results from our evaluation on multiple, established benchmarks suggest that combining powerful deep learning methods with classical clustering techniques can improve the detection of conversational groups in comparison to prior approaches. Finally, we demonstrate the practicality of our approach in a human-robot interaction scenario. Our efforts show that our work advances group detection not only in theory, but also in practice.
CVOct 7, 2018
Image Completion on CIFAR-10Mason Swofford
This project performed image completion on CIFAR-10, a dataset of 60,000 32x32 RGB images, using three different neural network architectures: fully convolutional networks, convolutional networks with fully connected layers, and encoder-decoder convolutional networks. The highest performing model was a deep fully convolutional network, which was able to achieve a mean squared error of .015 when comparing the original image pixel values with the predicted pixel values. As well, this network was able to output in-painted images which appeared real to the human eye.
CVOct 7, 2018
Conversational Group Detection With Deep Convolutional NetworksMason Swofford, John Peruzzi, Marynel Vázquez
Detection of interacting and conversational groups from images has applications in video surveillance and social robotics. In this paper we build on prior attempts to find conversational groups by detection of social gathering spaces called o-spaces used to assign people to groups. As our contributions to the task, we are the first paper to incorporate features extracted from the room layout image, and the first to incorporate a deep network to generate an image representation of the proposed o-spaces. Specifically, this novel network builds on the PointNet architecture which allows unordered inputs of variable sizes. We present accuracies which demonstrate the ability to rival and sometimes outperform the best models, but due to a data imbalance issue we do not yet outperform existing models in our test results.