AIJan 9
ART: Adaptive Reasoning Trees for Explainable Claim VerificationSahil Wadhwa, Himanshu Kumar, Guanqun Yang et al.
Large Language Models (LLMs) are powerful candidates for complex decision-making, leveraging vast encoded knowledge and remarkable zero-shot abilities. However, their adoption in high-stakes environments is hindered by their opacity; their outputs lack faithful explanations and cannot be effectively contested to correct errors, undermining trustworthiness. In this paper, we propose ART (Adaptive Reasoning Trees), a hierarchical method for claim verification. The process begins with a root claim, which branches into supporting and attacking child arguments. An argument's strength is determined bottom-up via a pairwise tournament of its children, adjudicated by a judge LLM, allowing a final, transparent and contestable verdict to be systematically derived which is missing in methods like Chain-of-Thought (CoT). We empirically validate ART on multiple datasets, analyzing different argument generators and comparison strategies. Our findings show that ART's structured reasoning outperforms strong baselines, establishing a new benchmark for explainable claim verification which is more reliable and ensures clarity in the overall decision making step.
CVMar 14, 2025
Non Line-of-Sight Optical Wireless Communication using Neuromorphic CamerasAbbaas Alif Mohamed Nishar, Alireza Marefat, Ashwin Ashok
Neuromorphic or event cameras, inspired by biological vision systems, capture changes in illumination with high temporal resolution and efficiency, producing streams of events rather than traditional images. In this paper, we explore the use of neuromorphic cameras for passive optical wireless communication (OWC), leveraging their asynchronous detection of illumination changes to decode data transmitted through reflections of light from objects. We propose a novel system that utilizes neuromorphic cameras for passive visible light communication (VLC), extending the concept to Non Line-of-Sight (NLoS) scenarios through passive reflections from everyday objects. Our experiments demonstrate the feasibility and advantages of using neuromorphic cameras for VLC, characterizing the performance of various modulation schemes, including traditional On-Off Keying (OOK) and advanced N-pulse modulation. We introduce an adaptive N-pulse modulation scheme that dynamically adjusts encoding based on the packet's bit composition, achieving higher data rates and robustness in different scenarios. Our results show that lighter-colored, glossy objects are better for NLoS communication, while larger objects and those with matte finishes experience higher error rates due to multipath reflections.
NIFeb 10, 2025
Text2Net: Transforming Plain-text To A Dynamic Interactive Network Simulation EnvironmentAlireza Marefat, Abbaas Alif Mohamed Nishar, Ashwin Ashok
This paper introduces Text2Net, an innovative text-based network simulation engine that leverages natural language processing (NLP) and large language models (LLMs) to transform plain-text descriptions of network topologies into dynamic, interactive simulations. Text2Net simplifies the process of configuring network simulations, eliminating the need for users to master vendor-specific syntaxes or navigate complex graphical interfaces. Through qualitative and quantitative evaluations, we demonstrate Text2Net's ability to significantly reduce the time and effort required to deploy network scenarios compared to traditional simulators like EVE-NG. By automating repetitive tasks and enabling intuitive interaction, Text2Net enhances accessibility for students, educators, and professionals. The system facilitates hands-on learning experiences for students that bridge the gap between theoretical knowledge and practical application. The results showcase its scalability across various network complexities, marking a significant step toward revolutionizing network education and professional use cases, such as proof-of-concept testing.
CVFeb 11
Stress Tests REVEAL Fragile Temporal and Visual Grounding in Video-Language ModelsSethuraman T, Savya Khosla, Aditi Tiwari et al.
This work investigates a fundamental question: Do Video-Language Models (VidLMs) robustly account for video content, temporal sequence, and motion? Our investigation shows that, surprisingly, they often do not. We introduce REVEAL{}, a diagnostic benchmark that probes fundamental weaknesses of contemporary VidLMs through five controlled stress tests; assessing temporal expectation bias, reliance on language-only shortcuts, video sycophancy, camera motion sensitivity, and robustness to spatiotemporal occlusion. We test leading open- and closed-source VidLMs and find that these models confidently describe reversed scenes as forward, answer questions while neglecting video content, agree with false claims, struggle with basic camera motion, and fail to aggregate temporal information amidst simple spatiotemporal masking. Humans, on the other hand, succeed at these tasks with ease. Alongside our benchmark, we provide a data pipeline that automatically generates diagnostic examples for our stress tests, enabling broader and more scalable evaluation. We will release our benchmark and code to support future research.
MMJan 4, 2025
Revelio: A Real-World Screen-Camera Communication System with Visually Imperceptible Data EmbeddingAbbaas Alif Mohamed Nishar, Shrinivas Kudekar, Bernard Kintzing et al.
We present `Revelio', a real-world screen-camera communication system leveraging temporal flicker fusion in the OKLAB color space. Using spatially-adaptive flickering and encoding information in pixel region shapes, Revelio achieves visually imperceptible data embedding while remaining robust against noise, asynchronicity, and distortions in screen-camera channels, ensuring reliable decoding by standard smartphone cameras. The decoder, driven by a two-stage neural network, uses a weighted differential accumulator for precise frame detection and symbol recognition. Initial experiments demonstrate Revelio's effectiveness in interactive television, offering an unobtrusive method for meta-information transmission.