When Neural Networks Using Different Sensors Create Similar Features
This is an incremental analysis for multimodal learning, with potential applications in areas like autonomous driving and robotics.
The paper tackled the problem of neural networks trained on different sensors producing similar features, demonstrating that linear combinations of last-layer features correlating most with other sensors correspond to classification components.
Multimodal problems are omnipresent in the real world: autonomous driving, robotic grasping, scene understanding, etc... We draw from the well-developed analysis of similarity to provide an example of a problem where neural networks are trained from different sensors, and where the features extracted from these sensors still carry similar information. More precisely, we demonstrate that for each sensor, the linear combination of the features from the last layer that correlates the most with other sensors corresponds to the classification components of the classification layer.