CVFeb 15, 2021Code
Win-Fail Action RecognitionParitosh Parmar, Brendan Morris
Current video/action understanding systems have demonstrated impressive performance on large recognition tasks. However, they might be limiting themselves to learning to recognize spatiotemporal patterns, rather than attempting to thoroughly understand the actions. To spur progress in the direction of a truer, deeper understanding of videos, we introduce the task of win-fail action recognition -- differentiating between successful and failed attempts at various activities. We introduce a first of its kind paired win-fail action understanding dataset with samples from the following domains: "General Stunts," "Internet Wins-Fails," "Trick Shots," and "Party Games." Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible. We systematically analyze the characteristics of the win-fail task/dataset with prototypical action recognition networks and a novel video retrieval task. While current action recognition methods work well on our task/dataset, they still leave a large gap to achieve high performance. We hope to motivate more work towards the true understanding of actions/videos. Dataset will be available from https://github.com/ParitoshParmar/Win-Fail-Action-Recognition.
CVJan 13, 2021Code
Piano Skills AssessmentParitosh Parmar, Jaiden Reddy, Brendan Morris
Can a computer determine a piano player's skill level? Is it preferable to base this assessment on visual analysis of the player's performance or should we trust our ears over our eyes? Since current CNNs have difficulty processing long video videos, how can shorter clips be sampled to best reflect the players skill level? In this work, we collect and release a first-of-its-kind dataset for multimodal skill assessment focusing on assessing piano player's skill level, answer the asked questions, initiate work in automated evaluation of piano playing skills and provide baselines for future work. Dataset is available from: https://github.com/ParitoshParmar/Piano-Skills-Assessment.
CVDec 10, 2019Code
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNNParitosh Parmar, Brendan Morris
Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action related task in a multitask learning setting. Thorough experimental evaluation shows that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. Codebase is available here: https://github.com/ParitoshParmar/HalluciNet.
CVMar 1, 2016
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network FeaturesJT Turner, Kalyan Gupta, Brendan Morris et al.
Although recent advances in regional Convolutional Neural Networks (CNNs) enable them to outperform conventional techniques on standard object detection and classification tasks, their response time is still slow for real-time performance. To address this issue, we propose a method for region proposal as an alternative to selective search, which is used in current state-of-the art object detection algorithms. We evaluate our Keypoint Density-based Region Proposal (KDRP) approach and show that it speeds up detection and classification on fine-grained tasks by 100% versus the existing selective search region proposal technique without compromising classification accuracy. KDRP makes the application of CNNs to real-time detection and classification feasible.