Siddhant Jain

h-index12

6papers

343citations

Novelty32%

AI Score40

Ranked #100,327 of 205,806 authors (top 49%)#32,846 in CV (top 56%)

6 Papers

2.6CLApr 17

Sentiment Analysis of German Sign Language Fairy Tales

Fabrizio Nunnari, Siddhant Jain, Patrick Gebhard

We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels of valence (negative, neutral, positive) on German fairy tales text segments using four large language models (LLMs) and majority voting, reaching an inter-annotator agreement of 0.781 Krippendorff's alpha. Second, we extract face and body motion features from each corresponding DGS video segment using MediaPipe. Finally, we train an explainable model (based on XGBoost) to predict negative, neutral or positive sentiment from video features. Results show an average balanced accuracy of 0.631. A thorough analysis of the most important features reveal that, in addition to eyebrows and mouth motion on the face, also the motion of hips, elbows, and shoulders considerably contribute in the discrimination of the conveyed sentiment, indicating an equal importance of face and body for sentiment communication in sign language.

CVApr 1, 2024

Video Interpolation with Diffusion Models

Siddhant Jain, Daniel Watson, Eric Tabellion et al.

We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video. We compare VIDIM to previous state-of-the-art methods on video interpolation, and demonstrate how such works fail in most settings where the underlying motion is complex, nonlinear, or ambiguous while VIDIM can easily handle such cases. We additionally demonstrate how classifier-free guidance on the start and end frame and conditioning the super-resolution model on the original high-resolution frames without additional parameters unlocks high-fidelity results. VIDIM is fast to sample from as it jointly denoises all the frames to be generated, requires less than a billion parameters per diffusion model to produce compelling results, and still enjoys scalability and improved quality at larger parameter counts.

NENov 14, 2019

Performance evaluation of deep neural networks for forecasting time-series with multiple structural breaks and high volatility

Rohit Kaushik, Shikhar Jain, Siddhant Jain et al.

The problem of automatic and accurate forecasting of time-series data has always been an interesting challenge for the machine learning and forecasting community. A majority of the real-world time-series problems have non-stationary characteristics that make the understanding of trend and seasonality difficult. Our interest in this paper is to study the applicability of the popular deep neural networks (DNN) as function approximators for non-stationary TSF. We evaluate the following DNN models: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN), and RNN with Long-Short Term Memory (LSTM-RNN) and RNN with Gated-Recurrent Unit (GRU-RNN). These DNN methods have been evaluated over 10 popular Indian financial stocks data. Further, the performance evaluation of these DNNs has been carried out in multiple independent runs for two settings of forecasting: (1) single-step forecasting, and (2) multi-step forecasting. These DNN methods show convincing performance for single-step forecasting (one-day ahead forecast). For the multi-step forecasting (multiple days ahead forecast), we have evaluated the methods for different forecast periods. The performance of these methods demonstrates that long forecast periods have an adverse effect on performance.

CVJun 11, 2019

Few-Shot Point Cloud Region Annotation with Human in the Loop

Siddhant Jain, Sowmya Munukutla, David Held

We propose a point cloud annotation framework that employs human-in-loop learning to enable the creation of large point cloud datasets with per-point annotations. Sparse labels from a human annotator are iteratively propagated to generate a full segmentation of the network by fine-tuning a pre-trained model of an allied task via a few-shot learning paradigm. We show that the proposed framework significantly reduces the amount of human interaction needed in annotating point clouds, without sacrificing on the quality of the annotations. Our experiments also suggest the suitability of the framework in annotating large datasets by noting a reduction in human interaction as the number of full annotations completed by the system increases. Finally, we demonstrate the flexibility of the framework to support multiple different annotations of the same point cloud enabling the creation of datasets with different granularities of annotation.

MLOct 29, 2018

An Amalgamation of Classical and Quantum Machine Learning For the Classification of Adenocarcinoma and Squamous Cell Carcinoma Patients

Siddhant Jain, Jalal Ziauddin, Paul Leonchyk et al.

The ability to accurately classify disease subtypes is of vital importance, especially in oncology where this capability could have a life saving impact. Here we report a classification between two subtypes of non-small cell lung cancer, namely Adeno- carcinoma vs Squamous cell carcinoma. The data consists of approximately 20,000 gene expression values for each of 104 patients. The data was curated from [1] [2]. We used an amalgamation of classical and and quantum machine learning models to successfully classify these patients. We utilized feature selection methods based on univariate statistics in addition to XGBoost [3]. A novel and proprietary data representation method developed by one of the authors called QCrush was also used as it was designed to incorporate a maximal amount of information under the size constraints of the D-Wave quantum annealing computer. The machine learning was performed by a Quantum Boltzmann Machine. This paper will report our results, the various classical methods, and the quantum machine learning approach we utilized.

CVApr 9, 2017

BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis

Shanxin Yuan, Qi Ye, Bjorn Stenger et al.

In this paper we introduce a large-scale hand pose dataset, collected using a novel capture method. Existing datasets are either generated synthetically or captured using depth sensors: synthetic datasets exhibit a certain level of appearance difference from real depth images, and real datasets are limited in quantity and coverage, mainly due to the difficulty to annotate them. We propose a tracking system with six 6D magnetic sensors and inverse kinematics to automatically obtain 21-joints hand pose annotations of depth maps captured with minimal restriction on the range of motion. The capture protocol aims to fully cover the natural hand pose space. As shown in embedding plots, the new dataset exhibits a significantly wider and denser range of hand poses compared to existing benchmarks. Current state-of-the-art methods are evaluated on the dataset, and we demonstrate significant improvements in cross-benchmark performance. We also show significant improvements in egocentric hand pose estimation with a CNN trained on the new dataset.