SYJan 13, 2018
Computation of Extended Robust Kalman Filter for Real-Time Attitude and Position EstimationGaurav Yengera, Roberto Inoue, Mundla Narasimhappa et al.
This paper deals with the implementation of the extended robust Kalman filter (ERKF) which was developed considering uncertainties in the parameter matrices of the underlying state-space model. A key contribution of this work is the demonstration of a method for real-time computation of the filter on parallel computing devices. The solution of the filter is expressed as a set of simultaneous linear equations, which can then be evaluated based on QR decomposition using Givens rotation. This paper also presents the application of the ERKF in the development of an attitude and position reference system for a cargo transport vehicle. This work concludes by analyzing the performance of the ERKF and verifying the validity of the Givens rotation method.
LGJun 8, 2021
Curriculum Design for Teaching via Demonstrations: Theory and ApplicationsGaurav Yengera, Rati Devidze, Parameswaran Kamalaruban et al.
We consider the problem of teaching via demonstrations in sequential decision-making settings. In particular, we study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Maximum Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC). Our unified strategy induces a ranking over demonstrations based on a notion of difficulty scores computed w.r.t. the teacher's optimal policy and the learner's current policy. Compared to the state of the art, our strategy doesn't require access to the learner's internal dynamics and still enjoys similar convergence guarantees under mild technical conditions. Furthermore, we adapt our curriculum strategy to the setting where no teacher agent is present using task-specific difficulty scores. Experiments on a synthetic car driving environment and navigation-based environments demonstrate the effectiveness of our curriculum strategy.
CVNov 28, 2018
Future-State Predicting LSTM for Early Surgery Type RecognitionSiddharth Kannan, Gaurav Yengera, Didier Mutter et al.
This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of 'smart' OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as 'Future-State Predicting LSTM', trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing 9 types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 minutes of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries.
CVMay 22, 2018
Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM NetworksGaurav Yengera, Didier Mutter, Jacques Marescaux et al.
Real-time algorithms for automatically recognizing surgical phases are needed to develop systems that can provide assistance to surgeons, enable better management of operating room (OR) resources and consequently improve safety within the OR. State-of-the-art surgical phase recognition algorithms using laparoscopic videos are based on fully supervised training. This limits their potential for widespread application, since creation of manual annotations is an expensive process considering the numerous types of existing surgeries and the vast amount of laparoscopic videos available. In this work, we propose a new self-supervised pre-training approach based on the prediction of remaining surgery duration (RSD) from laparoscopic videos. The RSD prediction task is used to pre-train a convolutional neural network (CNN) and long short-term memory (LSTM) network in an end-to-end manner. Our proposed approach utilizes all available data and reduces the reliance on annotated data, thereby facilitating the scaling up of surgical phase recognition algorithms to different kinds of surgeries. Additionally, we present EndoN2N, an end-to-end trained CNN-LSTM model for surgical phase recognition and evaluate the performance of our approach on a dataset of 120 Cholecystectomy laparoscopic videos (Cholec120). This work also presents the first systematic study of self-supervised pre-training approaches to understand the amount of annotations required for surgical phase recognition. Interestingly, the proposed RSD pre-training approach leads to performance improvement even when all the training data is manually annotated and outperforms the single pre-training approach for surgical phase recognition presently published in the literature. It is also observed that end-to-end training of CNN-LSTM networks boosts surgical phase recognition performance.
CVFeb 9, 2018
RSDNet: Learning to Predict Remaining Surgery Duration from Laparoscopic Videos Without Manual AnnotationsAndru Putra Twinanda, Gaurav Yengera, Didier Mutter et al.
Accurate surgery duration estimation is necessary for optimal OR planning, which plays an important role in patient comfort and safety as well as resource optimization. It is, however, challenging to preoperatively predict surgery duration since it varies significantly depending on the patient condition, surgeon skills, and intraoperative situation. In this paper, we propose a deep learning pipeline, referred to as RSDNet, which automatically estimates the remaining surgery duration (RSD) intraoperatively by using only visual information from laparoscopic videos. Previous state-of-the-art approaches for RSD prediction are dependent on manual annotation, whose generation requires expensive expert knowledge and is time-consuming, especially considering the numerous types of surgeries performed in a hospital and the large number of laparoscopic videos available. A crucial feature of RSDNet is that it does not depend on any manual annotation during training, making it easily scalable to many kinds of surgeries. The generalizability of our approach is demonstrated by testing the pipeline on two large datasets containing different types of surgeries: 120 cholecystectomy and 170 gastric bypass videos. The experimental results also show that the proposed network significantly outperforms a traditional method of estimating RSD without utilizing manual annotation. Further, this work provides a deeper insight into the deep learning network through visualization and interpretation of the features that are automatically learned.