CVMar 5, 2021
Harnessing Geometric Constraints from Emotion Labels to improve Face VerificationAnand Ramakrishnan, Minh Pham, Jacob Whitehill
For the task of face verification, we explore the utility of harnessing auxiliary facial emotion labels to impose explicit geometric constraints on the embedding space when training deep embedding models. We introduce several novel loss functions that, in conjunction with a standard Triplet Loss [43], or ArcFace loss [10], provide geometric constraints on the embedding space; the labels for our loss functions can be provided using either manually annotated or automatically detected auxiliary emotion labels. Our method is implemented purely in terms of the loss function and does not require any changes to the neural network backbone of the embedding function.
LGAug 8, 2020
Error Autocorrelation Objective Function for Improved System ModelingAnand Ramakrishnan, Warren B. Jackson, Kent Evans
Deep learning models are trained to minimize the error between the model's output and the actual values. The typical cost function, the Mean Squared Error (MSE), arises from maximizing the log-likelihood of additive independent, identically distributed Gaussian noise. However, minimizing MSE fails to minimize the residuals' cross-correlations, leading to over-fitting and poor extrapolation of the model outside the training set (generalization). In this paper, we introduce a "whitening" cost function, the Ljung-Box statistic, which not only minimizes the error but also minimizes the correlations between errors, ensuring that the fits enforce compatibility with an independent and identically distributed (i.i.d) gaussian noise model. The results show significant improvement in generalization for recurrent neural networks (RNNs) (1d) and image autoencoders (2d). Specifically, we look at both temporal correlations for system-id in simulated and actual mechanical systems. We also look at spatial correlation in vision autoencoders to demonstrate that the whitening objective functions lead to much better extrapolation--a property very desirable for reliable control systems.
CVMay 19, 2020
Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative ClimateAnand Ramakrishnan, Brian Zylich, Erin Ottmar et al.
In this work we present a multi-modal machine learning-based system, which we call ACORN, to analyze videos of school classrooms for the Positive Climate (PC) and Negative Climate (NC) dimensions of the CLASS observation protocol that is widely used in educational research. ACORN uses convolutional neural networks to analyze spectral audio features, the faces of teachers and students, and the pixels of each image frame, and then integrates this information over time using Temporal Convolutional Networks. The audiovisual ACORN's PC and NC predictions have Pearson correlations of $0.55$ and $0.63$ with ground-truth scores provided by expert CLASS coders on the UVA Toddler dataset (cross-validation on $n=300$ 15-min video segments), and a purely auditory ACORN predicts PC and NC with correlations of $0.36$ and $0.41$ on the MET dataset (test set of $n=2000$ videos segments). These numbers are similar to inter-coder reliability of human coders. Finally, using Graph Convolutional Networks we make early strides (AUC=$0.70$) toward predicting the specific moments (45-90sec clips) when the PC is particularly weak/strong. Our findings inform the design of automatic classroom observation and also more general video activity recognition and summary recognition systems.
LGDec 19, 2018
Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-TruthJacob Whitehill, Anand Ramakrishnan
Automatic machine learning-based detectors of various psychological and social phenomena (e.g., emotion, stress, engagement) have great potential to advance basic science. However, when a detector $d$ is trained to approximate an existing measurement tool (e.g., a questionnaire, observation protocol), then care must be taken when interpreting measurements collected using $d$ since they are one step further removed from the underlying construct. We examine how the accuracy of $d$, as quantified by the correlation $q$ of $d$'s outputs with the ground-truth construct $U$, impacts the estimated correlation between $U$ (e.g., stress) and some other phenomenon $V$ (e.g., academic performance). In particular: (1) We show that if the true correlation between $U$ and $V$ is $r$, then the expected sample correlation, over all vectors $\mathcal{T}^n$ whose correlation with $U$ is $q$, is $qr$. (2) We derive a formula for the probability that the sample correlation (over $n$ subjects) using $d$ is positive given that the true correlation is negative (and vice-versa); this probability can be substantial (around $20-30\%$) for values of $n$ and $q$ that have been used in recent affective computing studies. %We also show that this probability decreases monotonically in $n$ and in $q$. (3) With the goal to reduce the variance of correlations estimated by an automatic detector, we show that training multiple neural networks $d^{(1)},\ldots,d^{(m)}$ using different training architectures and hyperparameters for the same detection task provides only limited ``coverage'' of $\mathcal{T}^n$.