SPSep 1, 2022
Towards Multidimensional Textural Perception and Classification Through WhiskerPrasanna Kumar Routray, Aditya Sanjiv Kanade, Pauline Pounds et al.
Texture-based studies and designs have been in focus recently. Whisker-based multidimensional surface texture data is missing in the literature. This data is critical for robotics and machine perception algorithms in the classification and regression of textural surfaces. In this study, we present a novel sensor design to acquire multidimensional texture information. The surface texture's roughness and hardness were measured experimentally using sweeping and dabbing. Three machine learning models (SVM, RF, and MLP) showed excellent classification accuracy for the roughness and hardness of surface textures. We show that the combination of pressure and accelerometer data, collected from a standard machined specimen using the whisker sensor, improves classification accuracy. Further, we experimentally validate that the sensor can classify texture with roughness depths as low as $2.5μm$ at an accuracy of $90\%$ or more and segregate materials based on their roughness and hardness. We present a novel metric to consider while designing a whisker sensor to guarantee the quality of texture data acquisition beforehand. The machine learning model performance was validated against the data collected from the laser sensor from the same set of surface textures. As part of our work, we are releasing two-dimensional texture data: roughness and hardness to the research community.
85.4CVApr 17
Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMsSai Srinivas Kancheti, Aditya Sanjiv Kanade, Vineeth N. Balasubramanian et al.
Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. However, we show that this paradigm struggles with generalized spatial intelligence. We perform a comprehensive evaluation of seventeen models across thirteen spatial benchmarks and identify a critical gap: CoT prompting consistently degrades performance in visual spatial reasoning. Furthermore, through a novel No-Image++ ablation, we demonstrate that MRMs and CoT prompted MLMs suffer from severe shortcut learning, and hallucinate visual details from textual priors even when the image is absent. These findings challenge the efficacy of text-only CoT for spatial tasks and underscore the need for vision-centric reasoning paradigms.
CVSep 18, 2022
VisTaNet: Attention Guided Deep Fusion for Surface Roughness ClassificationPrasanna Kumar Routray, Aditya Sanjiv Kanade, Jay Bhanushali et al.
Human texture perception is a weighted average of multi-sensory inputs: visual and tactile. While the visual sensing mechanism extracts global features, the tactile mechanism complements it by extracting local features. The lack of coupled visuotactile datasets in the literature is a challenge for studying multimodal fusion strategies analogous to human texture perception. This paper presents a visual dataset that augments an existing tactile dataset. We propose a novel deep fusion architecture that fuses visual and tactile data using four types of fusion strategies: summation, concatenation, max-pooling, and attention. Our model shows significant performance improvements (97.22%) in surface roughness classification accuracy over tactile only (SVM - 92.60%) and visual only (FENet-50 - 85.01%) architectures. Among the several fusion techniques, attention-guided architecture results in better classification accuracy. Our study shows that analogous to human texture perception, the proposed model chooses a weighted combination of the two modalities (visual and tactile), thus resulting in higher surface roughness classification accuracy; and it chooses to maximize the weightage of the tactile modality where the visual modality fails and vice-versa.
AIMay 30, 2025
MIR: Methodology Inspiration Retrieval for Scientific Research ProblemsAniketh Garikaparthi, Manasi Patwardhan, Aditya Sanjiv Kanade et al.
There has been a surge of interest in harnessing the reasoning capabilities of Large Language Models (LLMs) to accelerate scientific discovery. While existing approaches rely on grounding the discovery process within the relevant literature, effectiveness varies significantly with the quality and nature of the retrieved literature. We address the challenge of retrieving prior work whose concepts can inspire solutions for a given research problem, a task we define as Methodology Inspiration Retrieval (MIR). We construct a novel dataset tailored for training and evaluating retrievers on MIR, and establish baselines. To address MIR, we build the Methodology Adjacency Graph (MAG); capturing methodological lineage through citation relationships. We leverage MAG to embed an "intuitive prior" into dense retrievers for identifying patterns of methodological inspiration beyond superficial semantic similarity. This achieves significant gains of +5.4 in Recall@3 and +7.8 in Mean Average Precision (mAP) over strong baselines. Further, we adapt LLM-based re-ranking strategies to MIR, yielding additional improvements of +4.5 in Recall@3 and +4.8 in mAP. Through extensive ablation studies and qualitative analyses, we exhibit the promise of MIR in enhancing automated scientific discovery and outline avenues for advancing inspiration-driven retrieval.