CVSep 6, 2022
Low-Energy Convolutional Neural Networks (CNNs) using Hadamard MethodVarun Mannam
The growing demand for the internet of things (IoT) makes it necessary to implement computer vision tasks such as object recognition in low-power devices. Convolutional neural networks (CNNs) are a potential approach for object recognition and detection. However, the convolutional layer in CNN consumes significant energy compared to the fully connected layers. To mitigate this problem, a new approach based on the Hadamard transformation as an alternative to the convolution operation is demonstrated using two fundamental datasets, MNIST and CIFAR10. The mathematical expression of the Hadamard method shows the clear potential to save energy consumption compared to convolutional layers, which are helpful with BigData applications. In addition, to the test accuracy of the MNIST dataset, the Hadamard method performs similarly to the convolution method. In contrast, with the CIFAR10 dataset, test data accuracy is dropped (due to complex data and multiple channels) compared to the convolution method. Finally, the demonstrated method is helpful for other computer vision tasks when the kernel size is smaller than the input image size.
AIOct 25, 2024
Knowledge Graph Enhanced Language Agents for RecommendationTaicheng Guo, Chaochun Liu, Hai Wang et al.
Language agents have recently been used to simulate human behavior and user-item interactions for recommendation systems. However, current language agent simulations do not understand the relationships between users and items, leading to inaccurate user profiles and ineffective recommendations. In this work, we explore the utility of Knowledge Graphs (KGs), which contain extensive and reliable relationships between users and items, for recommendation. Our key insight is that the paths in a KG can capture complex relationships between users and items, eliciting the underlying reasons for user preferences and enriching user profiles. Leveraging this insight, we propose Knowledge Graph Enhanced Language Agents(KGLA), a framework that unifies language agents and KG for recommendation systems. In the simulated recommendation scenario, we position the user and item within the KG and integrate KG paths as natural language descriptions into the simulation. This allows language agents to interact with each other and discover sufficient rationale behind their interactions, making the simulation more accurate and aligned with real-world cases, thus improving recommendation performance. Our experimental results show that KGLA significantly improves recommendation performance (with a 33%-95% boost in NDCG@1 among three widely used benchmarks) compared to the previous best baseline method.
AIJun 22, 2025
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent TablesVarun Mannam, Fang Wang, Chaochun Liu et al.
In talent management systems, critical information often resides in complex tabular formats, presenting significant retrieval challenges for conventional language models. These challenges are pronounced when processing Talent documentation that requires precise interpretation of tabular relationships for accurate information retrieval and downstream decision-making. Current table extraction methods struggle with semantic understanding, resulting in poor performance when integrated into retrieval-augmented chat applications. This paper identifies a key bottleneck - while structural table information can be extracted, the semantic relationships between tabular elements are lost, causing downstream query failures. To address this, we introduce TalentMine, a novel LLM-enhanced framework that transforms extracted tables into semantically enriched representations. Unlike conventional approaches relying on CSV or text linearization, our method employs specialized multimodal reasoning to preserve both structural and semantic dimensions of tabular data. Experimental evaluation across employee benefits document collections demonstrates TalentMine's superior performance, achieving 100% accuracy in query answering tasks compared to 0% for standard AWS Textract extraction and 40% for AWS Textract Visual Q&A capabilities. Our comparative analysis also reveals that the Claude v3 Haiku model achieves optimal performance for talent management applications. The key contributions of this work include (1) a systematic analysis of semantic information loss in current table extraction pipelines, (2) a novel LLM-based method for semantically enriched table representation, (3) an efficient integration framework for retrieval-augmented systems as end-to-end systems, and (4) comprehensive benchmarks on talent analytics tasks showing substantial improvements across multiple categories.
IRJun 19, 2025
Evaluating VisualRAG: Quantifying Cross-Modal Performance in Enterprise Document UnderstandingVarun Mannam, Fang Wang, Xin Chen
Current evaluation frameworks for multimodal generative AI struggle to establish trustworthiness, hindering enterprise adoption where reliability is paramount. We introduce a systematic, quantitative benchmarking framework to measure the trustworthiness of progressively integrating cross-modal inputs such as text, images, captions, and OCR within VisualRAG systems for enterprise document intelligence. Our approach establishes quantitative relationships between technical metrics and user-centric trust measures. Evaluation reveals that optimal modality weighting with weights of 30% text, 15% image, 25% caption, and 30% OCR improves performance by 57.3% over text-only baselines while maintaining computational efficiency. We provide comparative assessments of foundation models, demonstrating their differential impact on trustworthiness in caption generation and OCR extraction-a vital consideration for reliable enterprise AI. This work advances responsible AI deployment by providing a rigorous framework for quantifying and enhancing trustworthiness in multimodal RAG for critical enterprise applications.
CVJun 17, 2025
Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction AnalysisVarun Mannam, Zhenyu Shi
Accurate video annotation plays a vital role in modern retail applications, including customer behavior analysis, product interaction detection, and in-store activity recognition. However, conventional annotation methods heavily rely on time-consuming manual labeling by human annotators, introducing non-robust frame selection and increasing operational costs. To address these challenges in the retail domain, we propose a deep learning-based approach that automates key-frame identification in retail videos and provides automatic annotations of products and customers. Our method leverages deep neural networks to learn discriminative features by embedding video frames and incorporating object detection-based techniques tailored for retail environments. Experimental results showcase the superiority of our approach over traditional methods, achieving accuracy comparable to human annotator labeling while enhancing the overall efficiency of retail video annotation. Remarkably, our approach leads to an average of 2 times cost savings in video annotation. By allowing human annotators to verify/adjust less than 5% of detected frames in the video dataset, while automating the annotation process for the remaining frames without reducing annotation quality, retailers can significantly reduce operational costs. The automation of key-frame detection enables substantial time and effort savings in retail video labeling tasks, proving highly valuable for diverse retail applications such as shopper journey analysis, product interaction detection, and in-store security monitoring.
IVJan 3, 2022
Low dosage 3D volume fluorescence microscopy imaging using compressive sensingVarun Mannam, Jacob Brandt, Cody J. Smith et al.
Fluorescence microscopy has been a significant tool to observe long-term imaging of embryos (in vivo) growth over time. However, cumulative exposure is phototoxic to such sensitive live samples. While techniques like light-sheet fluorescence microscopy (LSFM) allow for reduced exposure, it is not well suited for deep imaging models. Other computational techniques are computationally expensive and often lack restoration quality. To address this challenge, one can use various low-dosage imaging techniques that are developed to achieve the 3D volume reconstruction using a few slices in the axial direction (z-axis); however, they often lack restoration quality. Also, acquiring dense images (with small steps) in the axial direction is computationally expensive. To address this challenge, we present a compressive sensing (CS) based approach to fully reconstruct 3D volumes with the same signal-to-noise ratio (SNR) with less than half of the excitation dosage. We present the theory and experimentally validate the approach. To demonstrate our technique, we capture a 3D volume of the RFP labeled neurons in the zebrafish embryo spinal cord (30um thickness) with the axial sampling of 0.1um using a confocal microscope. From the results, we observe the CS-based approach achieves accurate 3D volume reconstruction from less than 20% of the entire stack optical sections. The developed CS-based methodology in this work can be easily applied to other deep imaging modalities such as two-photon and light-sheet microscopy, where reducing sample photo-toxicity is a critical challenge.
IVMar 7, 2021
Convolutional Neural Network Denoising in Fluorescence Lifetime Imaging Microscopy (FLIM)Varun Mannam, Yide Zhang, Xiaotong Yuan et al.
Fluorescence lifetime imaging microscopy (FLIM) systems are limited by their slow processing speed, low signal-to-noise ratio (SNR), and expensive and challenging hardware setups. In this work, we demonstrate applying a denoising convolutional network to improve FLIM SNR. The network will be integrated with an instant FLIM system with fast data acquisition based on analog signal processing, high SNR using high-efficiency pulse-modulation, and cost-effective implementation utilizing off-the-shelf radio-frequency components. Our instant FLIM system simultaneously provides the intensity, lifetime, and phasor plots \textit{in vivo} and \textit{ex vivo}. By integrating image denoising using the trained deep learning model on the FLIM data, provide accurate FLIM phasor measurements are obtained. The enhanced phasor is then passed through the K-means clustering segmentation method, an unbiased and unsupervised machine learning technique to separate different fluorophores accurately. Our experimental \textit{in vivo} mouse kidney results indicate that introducing the deep learning image denoising model before the segmentation effectively removes the noise in the phasor compared to existing methods and provides clearer segments. Hence, the proposed deep learning-based workflow provides fast and accurate automatic segmentation of fluorescence images using instant FLIM. The denoising operation is effective for the segmentation if the FLIM measurements are noisy. The clustering can effectively enhance the detection of biological structures of interest in biomedical imaging applications.
IVMar 7, 2021
Deep learning-based super-resolution fluorescence microscopy on small datasetsVarun Mannam, Yide Zhang, Xiaotong Yuan et al.
Fluorescence microscopy has enabled a dramatic development in modern biology by visualizing biological organisms with micrometer scale resolution. However, due to the diffraction limit, sub-micron/nanometer features are difficult to resolve. While various super-resolution techniques are developed to achieve nanometer-scale resolution, they often either require expensive optical setup or specialized fluorophores. In recent years, deep learning has shown the potentials to reduce the technical barrier and obtain super-resolution from diffraction-limited images. For accurate results, conventional deep learning techniques require thousands of images as a training dataset. Obtaining large datasets from biological samples is not often feasible due to the photobleaching of fluorophores, phototoxicity, and dynamic processes occurring within the organism. Therefore, achieving deep learning-based super-resolution using small datasets is challenging. We address this limitation with a new convolutional neural network-based approach that is successfully trained with small datasets and achieves super-resolution images. We captured 750 images in total from 15 different field-of-views as the training dataset to demonstrate the technique. In each FOV, a single target image is generated using the super-resolution radial fluctuation method. As expected, this small dataset failed to produce a usable model using traditional super-resolution architecture. However, using the new approach, a network can be trained to achieve super-resolution images from this small dataset. This deep learning model can be applied to other biomedical imaging modalities such as MRI and X-ray imaging, where obtaining large training datasets is challenging.
IVSep 24, 2020
Packet Compressed Sensing Imaging (PCSI): Robust Image Transmission over Noisy ChannelsScott Howard, Grant Barthelmes, Cara Ravasio et al.
Packet Compressed Sensing Imaging (PCSI) is digital unconnected image transmission method resilient to packet loss. The goal is to develop a robust image transmission method that is computationally trivial to transmit (e.g., compatible with low-power 8-bit microcontrollers) and well suited for weak signal environments where packets are likely to be lost. In other image transmission techniques, noise and packet loss leads to parts of the image being distorted or missing. In PCSI, every packet contains random pixel information from the entire image, and each additional packet received (in any order) simply enhances image quality. Satisfactory SSTV resolution (320x240 pixel) images can be received in ~1-2 minutes when transmitted at 1200 baud AFSK, which is on par with analog SSTV transmission time. Image transmission and reception can occur simultaneously on a computer, and multiple images can be received from multiple stations simultaneously - allowing for the creation of "image nets." This paper presents a simple computer application for Windows, Mac, and Linux that implements PCSI transmission and reception on any KISS compatible hardware or software modem on any band and digital mode.
IVAug 5, 2020
Machine learning for faster and smarter fluorescence lifetime imaging microscopyVarun Mannam, Yide Zhang, Xiaotong Yuan et al.
Fluorescence lifetime imaging microscopy (FLIM) is a powerful technique in biomedical research that uses the fluorophore decay rate to provide additional contrast in fluorescence microscopy. However, at present, the calculation, analysis, and interpretation of FLIM is a complex, slow, and computationally expensive process. Machine learning (ML) techniques are well suited to extract and interpret measurements from multi-dimensional FLIM data sets with substantial improvement in speed over conventional methods. In this topical review, we first discuss the basics of FILM and ML. Second, we provide a summary of lifetime extraction strategies using ML and its applications in classifying and segmenting FILM images with higher accuracy compared to conventional methods. Finally, we discuss two potential directions to improve FLIM with ML with proof of concept demonstrations.
LGFeb 26, 2020
Performance Analysis of Semi-supervised Learning in the Small-data Regime using VAEsVarun Mannam, Arman Kazemi
Extracting large amounts of data from biological samples is not feasible due to radiation issues, and image processing in the small-data regime is one of the critical challenges when working with a limited amount of data. In this work, we applied an existing algorithm named Variational Auto Encoder (VAE) that pre-trains a latent space representation of the data to capture the features in a lower-dimension for the small-data regime input. The fine-tuned latent space provides constant weights that are useful for classification. Here we will present the performance analysis of the VAE algorithm with different latent space sizes in the semi-supervised learning using the CIFAR-10 dataset.