CVOct 20, 2022
Robustcaps: a transformation-robust capsule network for image classificationSai Raam Venkataraman, S. Balasubramanian, R. Raghunatha Sarma
Geometric transformations of the training data as well as the test data present challenges to the use of deep neural networks to vision-based learning tasks. In order to address this issue, we present a deep neural network model that exhibits the desirable property of transformation-robustness. Our model, termed RobustCaps, uses group-equivariant convolutions in an improved capsule network model. RobustCaps uses a global context-normalised procedure in its routing algorithm to learn transformation-invariant part-whole relationships within image data. This learning of such relationships allows our model to outperform both capsule and convolutional neural network baselines on transformation-robust classification tasks. Specifically, RobustCaps achieves state-of-the-art accuracies on CIFAR-10, FashionMNIST, and CIFAR-100 when the images in these datasets are subjected to train and test-time rotations and translations.
CVOct 20, 2022
Iterative collaborative routing among equivariant capsules for transformation-robust capsule networksSai Raam Venkataraman, S. Balasubramanian, R. Raghunatha Sarma
Transformation-robustness is an important feature for machine learning models that perform image classification. Many methods aim to bestow this property to models by the use of data augmentation strategies, while more formal guarantees are obtained via the use of equivariant models. We recognise that compositional, or part-whole structure is also an important aspect of images that has to be considered for building transformation-robust models. Thus, we propose a capsule network model that is, at once, equivariant and compositionality-aware. Equivariance of our capsule network model comes from the use of equivariant convolutions in a carefully-chosen novel architecture. The awareness of compositionality comes from the use of our proposed novel, iterative, graph-based routing algorithm, termed Iterative collaborative routing (ICR). ICR, the core of our contribution, weights the predictions made for capsules based on an iteratively averaged score of the degree-centralities of its nearest neighbours. Experiments on transformed image classification on FashionMNIST, CIFAR-10, and CIFAR-100 show that our model that uses ICR outperforms convolutional and capsule baselines to achieve state-of-the-art performance.
CVMar 15, 2022
Can you even tell left from right? Presenting a new challenge for VQASai Raam Venkatraman, Rishi Rao, S. Balasubramanian et al.
Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition. In this work, we present several quantitative measures of compositional separation and find that popular datasets for VQA are not good evaluators. To solve this, we present Uncommon Objects in Unseen Configurations (UOUC), a synthetic dataset for VQA. UOUC is at once fairly complex while also being well-separated, compositionally. The object-class of UOUC consists of 380 clasess taken from 528 characters from the Dungeons and Dragons game. The train set of UOUC consists of 200,000 scenes; whereas the test set consists of 30,000 scenes. In order to study compositional generalisation, simple reasoning and memorisation, each scene of UOUC is annotated with up to 10 novel questions. These deal with spatial relationships, hypothetical changes to scenes, counting, comparison, memorisation and memory-based reasoning. In total, UOUC presents over 2 million questions. UOUC also finds itself as a strong challenge to well-performing models for VQA. Our evaluation of recent models for VQA shows poor compositional generalisation, and comparatively lower ability towards simple reasoning. These results suggest that UOUC could lead to advances in research by being a strong benchmark for VQA.
17.2ARMar 18
ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency OptimizationPanuganti Chirag Sai, Gandholi Sarat, R. Raghunatha Sarma et al.
Reducing latency and energy consumption is critical to improving the efficiency of memory systems in modern computing. This work introduces ReLMXEL (Reinforcement Learning for Memory Controller with Explainable Energy and Latency Optimization), a explainable multi-agent online reinforcement learning framework that dynamically optimizes memory controller parameters using reward decomposition. ReLMXEL operates within the memory controller, leveraging detailed memory behavior metrics to guide decision-making. Experimental evaluations across diverse workloads demonstrate consistent performance gains over baseline configurations, with refinements driven by workload-specific memory access behaviour. By incorporating explainability into the learning process, ReLMXEL not only enhances performance but also increases the transparency of control decisions, paving the way for more accountable and adaptive memory system designs.
LGOct 4, 2020
Learning Compositional Structures for Deep Learning: Why Routing-by-agreement is NecessarySai Raam Venkatraman, Ankit Anand, S. Balasubramanian et al.
A formal description of the compositionality of neural networks is associated directly with the formal grammar-structure of the objects it seeks to represent. This formal grammar-structure specifies the kind of components that make up an object, and also the configurations they are allowed to be in. In other words, objects can be described as a parse-tree of its components -- a structure that can be seen as a candidate for building connection-patterns among neurons in neural networks. We present a formal grammar description of convolutional neural networks and capsule networks that shows how capsule networks can enforce such parse-tree structures, while CNNs do not. Specifically, we show that the entropy of routing coefficients in the dynamic routing algorithm controls this ability. Thus, we introduce the entropy of routing weights as a loss function for better compositionality among capsules. We show by experiments, on data with a compositional structure, that the use of this loss enables capsule networks to better detect changes in compositionality. Our experiments show that as the entropy of the routing weights increases, the ability to detect changes in compositionality reduces. We see that, without routing, capsule networks perform similar to convolutional neural networks in that both these models perform badly at detecting changes in compositionality. Our results indicate that routing is an important part of capsule networks -- effectively answering recent work that has questioned its necessity. We also, by experiments on SmallNORB, CIFAR-10, and FashionMNIST, show that this loss keeps the accuracy of capsule network models comparable to models that do not use it .
LGAug 4, 2019
Building Deep, Equivariant Capsule NetworksSairaam Venkatraman, S. Balasubramanian, R. Raghunatha Sarma
Capsule networks are constrained by the parameter-expensive nature of their layers, and the general lack of provable equivariance guarantees. We present a variation of capsule networks that aims to remedy this. We identify that learning all pair-wise part-whole relationships between capsules of successive layers is inefficient. Further, we also realise that the choice of prediction networks and the routing mechanism are both key to equivariance. Based on these, we propose an alternative framework for capsule networks that learns to projectively encode the manifold of pose-variations, termed the space-of-variation (SOV), for every capsule-type of each layer. This is done using a trainable, equivariant function defined over a grid of group-transformations. Thus, the prediction-phase of routing involves projection into the SOV of a deeper capsule using the corresponding function. As a specific instantiation of this idea, and also in order to reap the benefits of increased parameter-sharing, we use type-homogeneous group-equivariant convolutions of shallower capsules in this phase. We also introduce an equivariant routing mechanism based on degree-centrality. We show that this particular instance of our general model is equivariant, and hence preserves the compositional representation of an input under transformations. We conduct several experiments on standard object-classification datasets that showcase the increased transformation-robustness, as well as general performance, of our model to several capsule baselines.