Sohini Roychowdhury

CV
h-index18
25papers
283citations
Novelty41%
AI Score39

25 Papers

CLNov 18, 2023
Journey of Hallucination-minimized Generative AI Solutions for Financial Decision Makers

Sohini Roychowdhury

Generative AI has significantly reduced the entry barrier to the domain of AI owing to the ease of use and core capabilities of automation, translation, and intelligent actions in our day to day lives. Currently, Large language models (LLMs) that power such chatbots are being utilized primarily for their automation capabilities for software monitoring, report generation etc. and for specific personalized question answering capabilities, on a limited scope and scale. One major limitation of the currently evolving family of LLMs is 'hallucinations', wherein inaccurate responses are reported as factual. Hallucinations are primarily caused by biased training data, ambiguous prompts and inaccurate LLM parameters, and they majorly occur while combining mathematical facts with language-based context. Thus, monitoring and controlling for hallucinations becomes necessary when designing solutions that are meant for decision makers. In this work we present the three major stages in the journey of designing hallucination-minimized LLM-based solutions that are specialized for the decision makers of the financial domain, namely: prototyping, scaling and LLM evolution using human feedback. These three stages and the novel data to answer generation modules presented in this work are necessary to ensure that the Generative AI chatbots, autonomous reports and alerts are reliable and high-quality to aid key decision-making processes.

CLNov 9, 2023
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers

Sohini Roychowdhury, Andres Alvarez, Brian Moore et al.

Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data-table heavy domains such as financial decision making. In this work, we present a novel Langchain-based framework that transforms data tables into hierarchical textual data chunks to enable a wide variety of actionable question answering. First, the user-queries are classified by intention followed by automated retrieval of the most relevant data chunks to generate customized LLM prompts per query. Next, the custom prompts and their responses undergo multi-metric scoring to assess for hallucinations and response confidence. The proposed system is optimized with user-query intention classification, advanced prompting, data scaling capabilities and it achieves over 90% confidence scores for a variety of user-queries responses ranging from {What, Where, Why, How, predict, trend, anomalies, exceptions} that are crucial for financial decision making applications. The proposed data to answers framework can be extended to other analytical domains such as sales and payroll to ensure optimal hallucination control guardrails.

CVMar 25, 2022
Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification

Sohini Roychowdhury

Automating video-based data and machine learning pipelines poses several challenges including metadata generation for efficient storage and retrieval and isolation of key-frames for scene understanding tasks. In this work, we present two semi-supervised approaches that automate this process of manual frame sifting in video streams by automatically classifying scenes for content and filtering frames for fine-tuning scene understanding tasks. The first rule-based method starts from a pre-trained object detector and it assigns scene type, uncertainty and lighting categories to each frame based on probability distributions of foreground objects. Next, frames with the highest uncertainty and structural dissimilarity are isolated as key-frames. The second method relies on the simCLR model for frame encoding followed by label-spreading from 20% of frame samples to label the remaining frames for scene and lighting categories. Also, clustering the video frames in the encoded feature space further isolates key-frames at cluster boundaries. The proposed methods achieve 64-93% accuracy for automated scene categorization for outdoor image videos from public domain datasets of JAAD and KITTI. Also, less than 10% of all input frames can be filtered as key-frames that can then be sent for annotation and fine tuning of machine vision algorithms. Thus, the proposed framework can be scaled to additional video data streams for automated training of perception-driven systems with minimal training images.

IVApr 5, 2023
NUMSnet: Nested-U Multi-class Segmentation network for 3D Medical Image Stacks

Sohini Roychowdhury

Semantic segmentation for medical 3D image stacks enables accurate volumetric reconstructions, computer-aided diagnostics and follow up treatment planning. In this work, we present a novel variant of the Unet model called the NUMSnet that transmits pixel neighborhood features across scans through nested layers to achieve accurate multi-class semantic segmentations with minimal training data. We analyze the semantic segmentation performance of the NUMSnet model in comparison with several Unet model variants to segment 3-7 regions of interest using only 10% of images for training per Lung-CT and Heart-CT volumetric image stacks. The proposed NUMSnet model achieves up to 20% improvement in segmentation recall with 4-9% improvement in Dice scores for Lung-CT stacks and 2.5-10% improvement in Dice scores for Heart-CT stacks when compared to the Unet++ model. The NUMSnet model needs to be trained by ordered images around the central scan of each volumetric stack. Propagation of image feature information from the 6 nested layers of the Unet++ model are found to have better computation and segmentation performances than propagation of all up-sampling layers in a Unet++ model. The NUMSnet model achieves comparable segmentation performances to existing works, while being trained on as low as 5\% of the training images. Also, transfer learning allows faster convergence of the NUMSnet model for multi-class semantic segmentation from pathology in Lung-CT images to cardiac segmentations in Heart-CT stacks. Thus, the proposed model can standardize multi-class semantic segmentation on a variety of volumetric image stacks with minimal training dataset. This can significantly reduce the cost, time and inter-observer variabilities associated with computer-aided detections and treatment.

CLAug 9, 2024
FiSTECH: Financial Style Transfer to Enhance Creativity without Hallucinations in LLMs

Sohini Roychowdhury, Marko Krema, Brian Moore et al.

Recent trends in Generative AI have emerged towards fine-tuning foundational large language models (LLMs) to create domain-specific LLMs for automation and chatbot-like applications. Specialized applications for analytics-heavy domains such as Financial report generation require specific writing styles that comprise compound and creative sentences with minimized hallucinations. In this work, we explore the self-corrective auto-regressive qualities of LLMs to learn creativity in writing styles with minimal prompting. We propose a novel two-stage fine-tuning (FT) strategy wherein in the first stage public domain financial reports are used to train for writing styles while allowing the LLM to hallucinate. In the second stage the examples of hallucinations are manually corrected and further used to fine-tune the LLM. The finally trained LLM learns to generate specific financial report sections using minimal instructions and tabular data inputs while ensuring low fine-tuning costs. Our proposed two-stage fine-tuning boosts the accuracy of financial questions answering by two-folds while reducing hallucinations by over 50%. Also, the fine-tuned model has lower perplexity, improved ROUGE, TER and BLEU scores, higher creativity and knowledge density with lower uncertainty and cross entropy than base LLMs. Thus, the proposed framework can be generalized to train creativity in LLMs by first allowing them to hallucinate.

AIMay 7, 2024
ERATTA: Extreme RAG for Table To Answers with Large Language Models

Sohini Roychowdhury, Marko Krema, Anvar Mahammad et al.

Large language models (LLMs) with retrieval augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. Although RAG implemented with AI agents (agentic-RAG) has been recently popularized, its suffers from unstable cost and unreliable performances for Enterprise-level data-practices. Most existing use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user-query routing, data-retrieval and custom prompting for question-answering capabilities from Enterprise-data tables. The source tables here are highly fluctuating and large in size and the proposed framework enables structured responses in under 10 seconds per query. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.

LGNov 23, 2025
DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations

Sohini Roychowdhury, Adam Holeman, Mohammad Amin et al.

For online ad-recommendation systems, processing complete user-ad-engagement histories is both computationally intensive and noise-prone. We introduce Dynamix, a scalable, personalized sequence exploration framework that optimizes event history processing using maximum relevance principles and self-supervised learning through Event Based Features (EBFs). Dynamix categorizes users-engagements at session and surface-levels by leveraging correlations between dwell-times and ad-conversion events. This enables targeted, event-level feature removal and selective feature boosting for certain user-segments, thereby yielding training and inference efficiency wins without sacrificing engaging ad-prediction accuracy. While, dynamic resource removal increases training and inference throughput by 1.15% and 1.8%, respectively, dynamic feature boosting provides 0.033 NE gains while boosting inference QPS by 4.2% over baseline models. These results demonstrate that Dynamix achieves significant cost efficiency and performance improvements in online user-sequence based recommendation models. Self-supervised user-segmentation and resource exploration can further boost complex feature selection strategies while optimizing for workflow and compute resources.

LGJun 20, 2025
SIDE: Semantic ID Embedding for effective learning from sequences

Dinesh Ramasamy, Shakti Kumar, Chris Cadonic et al.

Sequence-based recommendations models are driving the state-of-the-art for industrial ad-recommendation systems. Such systems typically deal with user histories or sequence lengths ranging in the order of O(10^3) to O(10^4) events. While adding embeddings at this scale is manageable in pre-trained models, incorporating them into real-time prediction models is challenging due to both storage and inference costs. To address this scaling challenge, we propose a novel approach that leverages vector quantization (VQ) to inject a compact Semantic ID (SID) as input to the recommendation models instead of a collection of embeddings. Our method builds on recent works of SIDs by introducing three key innovations: (i) a multi-task VQ-VAE framework, called VQ fusion that fuses multiple content embeddings and categorical predictions into a single Semantic ID; (ii) a parameter-free, highly granular SID-to-embedding conversion technique, called SIDE, that is validated with two content embedding collections, thereby eliminating the need for a large parameterized lookup table; and (iii) a novel quantization method called Discrete-PCA (DPCA) which generalizes and enhances residual quantization techniques. The proposed enhancements when applied to a large-scale industrial ads-recommendation system achieves 2.4X improvement in normalized entropy (NE) gain and 3X reduction in data footprint compared to traditional SID methods.

IVOct 27, 2021
QU-net++: Image Quality Detection Framework for Segmentation of Medical 3D Image Stacks

Sohini Roychowdhury

Automated segmentation of pathological regions of interest aids medical image diagnostics and follow-up care. However, accurate pathological segmentations require high quality of annotated data that can be both cost and time intensive to generate. In this work, we propose an automated two-step method that detects a minimal image subset required to train segmentation models by evaluating the quality of medical images from 3D image stacks using a U-net++ model. These images that represent a lack of quality training can then be annotated and used to fully train a U-net-based segmentation model. The proposed QU-net++ model detects this lack of quality training based on the disagreement in segmentations produced from the final two output layers. The proposed model isolates around 10% of the slices per 3D image stack and can scale across imaging modalities to segment cysts in OCT images and ground glass opacity (GGO) in lung CT images with Dice scores in the range 0.56-0.72. Thus, the proposed method can be applied for cost effective multi-modal pathology segmentation tasks.

CVOct 15, 2021
Video-Data Pipelines for Machine Learning Applications

Sohini Roychowdhury, James Y. Sato

Data pipelines are an essential component for end-to-end solutions that take machine learning algorithms to production. Engineering data pipelines for video-sequences poses several challenges including isolation of key-frames from video sequences that are high quality and represent significant variations in the scene. Manual isolation of such quality key-frames can take hours of sifting through hours worth of video data. In this work, we present a data pipeline framework that can automate this process of manual frame sifting in video sequences by controlling the fraction of frames that can be removed based on image quality and content type. Additionally, the frames that are retained can be automatically tagged per sequence, thereby simplifying the process of automated data retrieval for future ML model deployments. We analyze the performance of the proposed video-data pipeline for versioned deployment and monitoring for object detection algorithms that are trained on outdoor autonomous driving video sequences. The proposed video-data pipeline can retain anywhere between 0.1-20% of the all input frames that are representative of high image quality and high variations in content. This frame selection, automated scene tagging followed by model verification can be completed in under 30 seconds for 22 video-sequences under analysis in this work. Thus, the proposed framework can be scaled to additional video-sequence data sets for automating ML versioned deployments.

CVFeb 23, 2021
SISE-PC: Semi-supervised Image Subsampling for Explainable Pathology

Sohini Roychowdhury, Kwok Sun Tang, Mohith Ashok et al.

Although automated pathology classification using deep learning (DL) has proved to be predictively efficient, DL methods are found to be data and compute cost intensive. In this work, we aim to reduce DL training costs by pre-training a Resnet feature extractor using SimCLR contrastive loss for latent encoding of OCT images. We propose a novel active learning framework that identifies a minimal sub-sampled dataset containing the most uncertain OCT image samples using label propagation on the SimCLR latent encodings. The pre-trained Resnet model is then fine-tuned with the labelled minimal sub-sampled data and the underlying pathological sites are visually explained. Our framework identifies upto 2% of OCT images to be most uncertain that need prioritized specialist attention and that can fine-tune a Resnet model to achieve upto 97% classification accuracy. The proposed method can be extended to other medical images to minimize prediction costs.

LGFeb 2, 2021
OPAM: Online Purchasing-behavior Analysis using Machine learning

Sohini Roychowdhury, Ebrahim Alareqi, Wenxi Li

Customer purchasing behavior analysis plays a key role in developing insightful communication strategies between online vendors and their customers. To support the recent increase in online shopping trends, in this work, we present a customer purchasing behavior analysis system using supervised, unsupervised and semi-supervised learning methods. The proposed system analyzes session and user-journey level purchasing behaviors to identify customer categories/clusters that can be useful for targeted consumer insights at scale. We observe higher sensitivity to the design of online shopping portals for session-level purchasing prediction with accuracy/recall in range 91-98%/73-99%, respectively. The user-journey level analysis demonstrates five unique user clusters, wherein 'New Shoppers' are most predictable and 'Impulsive Shoppers' are most unique with low viewing and high carting behaviors for purchases. Further, cluster transformation metrics and partial label learning demonstrates the robustness of each user cluster to new/unlabelled events. Thus, customer clusters can aid strategic targeted nudge models.

CVDec 5, 2020
Cirrus: A Long-range Bi-pattern LiDAR Dataset

Ze Wang, Sihao Ding, Ying Li et al.

In this paper, we introduce Cirrus, a new long-range bi-pattern LiDAR public dataset for autonomous driving tasks such as 3D object detection, critical to highway driving and timely decision making. Our platform is equipped with a high-resolution video camera and a pair of LiDAR sensors with a 250-meter effective range, which is significantly longer than existing public datasets. We record paired point clouds simultaneously using both Gaussian and uniform scanning patterns. Point density varies significantly across such a long range, and different scanning patterns further diversify object representation in LiDAR. In Cirrus, eight categories of objects are exhaustively annotated in the LiDAR point clouds for the entire effective range. To illustrate the kind of studies supported by this new dataset, we introduce LiDAR model adaptation across different ranges, scanning patterns, and sensor devices. Promising results show the great potential of this new dataset to the robotics and computer vision communities.

LGOct 6, 2020
Categorizing Online Shopping Behavior from Cosmetics to Electronics: An Analytical Framework

Sohini Roychowdhury, Wenxi Li, Ebrahim Alareqi et al.

A success factor for modern companies in the age of Digital Marketing is to understand how customers think and behave based on their online shopping patterns. While the conventional method of gathering consumer insights through questionnaires and surveys still form the bases of descriptive analytics for market intelligence units, we propose a machine learning framework to automate this process. In this paper we present a modular consumer data analysis platform that processes session level interaction records between users and products to predict session level, user journey level and customer behavior specific patterns leading towards purchase events. We explore the computational framework and provide test results on two Big data sets-cosmetics and consumer electronics of size 2GB and 15GB, respectively. The proposed system achieves 97-99% classification accuracy and recall for user-journey level purchase predictions and categorizes buying behavior into 5 clusters with increasing purchase ratios for both data sets. Thus, the proposed framework is extendable to other large e-commerce data sets to obtain automated purchase predictions and descriptive consumer insights.

CVAug 7, 2020
Few Shot Learning Framework to Reduce Inter-observer Variability in Medical Images

Sohini Roychowdhury

Most computer aided pathology detection systems rely on large volumes of quality annotated data to aid diagnostics and follow up procedures. However, quality assuring large volumes of annotated medical image data can be subjective and expensive. In this work we present a novel standardization framework that implements three few-shot learning (FSL) models that can be iteratively trained by atmost 5 images per 3D stack to generate multiple regional proposals (RPs) per test image. These FSL models include a novel parallel echo state network (ParESN) framework and an augmented U-net model. Additionally, we propose a novel target label selection algorithm (TLSA) that measures relative agreeability between RPs and the manually annotated target labels to detect the "best" quality annotation per image. Using the FSL models, our system achieves 0.28-0.64 Dice coefficient across vendor image stacks for intra-retinal cyst segmentation. Additionally, the TLSA is capable of automatically classifying high quality target labels from their noisy counterparts for 60-97% of the images while ensuring manual supervision on remaining images. Also, the proposed framework with ParESN model minimizes manual annotation checking to 12-28% of the total number of images. The TLSA metrics further provide confidence scores for the automated annotation quality assurance. Thus, the proposed framework is flexible to extensions for quality image annotation curation of other image stacks as well.

CVMay 15, 2020
FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

Francesco Piccoli, Rajarathnam Balakrishnan, Maria Jesus Perez et al.

Pedestrian intention recognition is very important to develop robust and safe autonomous driving (AD) and advanced driver assistance systems (ADAS) functionalities for urban driving. In this work, we develop an end-to-end pedestrian intention framework that performs well on day- and night- time scenarios. Our framework relies on objection detection bounding boxes combined with skeletal features of human pose. We study early, late, and combined (early and late) fusion mechanisms to exploit the skeletal features and reduce false positives as well to improve the intention prediction performance. The early fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for pedestrian intention classification. Furthermore, we propose three new metrics to properly evaluate the pedestrian intention systems. Under these new evaluation metrics for the intention prediction, the proposed end-to-end network offers accurate pedestrian intention up to half a second ahead of the actual risky maneuver.

CVSep 26, 2019
Range Adaptation for 3D Object Detection in LiDAR

Ze Wang, Sihao Ding, Ying Li et al.

LiDAR-based 3D object detection plays a crucial role in modern autonomous driving systems. LiDAR data often exhibit severe changes in properties across different observation ranges. In this paper, we explore cross-range adaptation for 3D object detection using LiDAR, i.e., far-range observations are adapted to near-range. This way, far-range detection is optimized for similar performance to near-range one. We adopt a bird-eyes view (BEV) detection framework to perform the proposed model adaptation. Our model adaptation consists of an adversarial global adaptation, and a fine-grained local adaptation. The proposed cross range adaptation framework is validated on three state-of-the-art LiDAR based object detection networks, and we consistently observe performance improvement on the far-range objects, without adding any auxiliary parameters to the model. To the best of our knowledge, this paper is the first attempt to study cross-range LiDAR adaptation for object detection in point clouds. To demonstrate the generality of the proposed adaptation framework, experiments on more challenging cross-device adaptation are further conducted, and a new LiDAR dataset with high-quality annotated point clouds is released to promote future research.

CVMar 17, 2017
Computer Aided Detection of Anemia-like Pallor

Sohini Roychowdhury, Donny Sun, Matthew Bihis et al.

Paleness or pallor is a manifestation of blood loss or low hemoglobin concentrations in the human blood that can be caused by pathologies such as anemia. This work presents the first automated screening system that utilizes pallor site images, segments, and extracts color and intensity-based features for multi-class classification of patients with high pallor due to anemia-like pathologies, normal patients and patients with other abnormalities. This work analyzes the pallor sites of conjunctiva and tongue for anemia screening purposes. First, for the eye pallor site images, the sclera and conjunctiva regions are automatically segmented for regions of interest. Similarly, for the tongue pallor site images, the inner and outer tongue regions are segmented. Then, color-plane based feature extraction is performed followed by machine learning algorithms for feature reduction and image level classification for anemia. In this work, a suite of classification algorithms image-level classifications for normal (class 0), pallor (class 1) and other abnormalities (class 2). The proposed method achieves 86% accuracy, 85% precision and 67% recall in eye pallor site images and 98.2% accuracy and precision with 100% recall in tongue pallor site images for classification of images with pallor. The proposed pallor screening system can be further fine-tuned to detect the severity of anemia-like pathologies using controlled set of local images that can then be used for future benchmarking purposes.

CVOct 24, 2016
Automated OCT Segmentation for Images with DME

Sohini Roychowdhury, Dara D. Koozekanani, Michael Reinsbach et al.

This paper presents a novel automated system that segments six sub-retinal layers from optical coherence tomography (OCT) image stacks of healthy patients and patients with diabetic macular edema (DME). First, each image in the OCT stack is denoised using a Wiener deconvolution algorithm that estimates the additive speckle noise variance using a novel Fourier-domain based structural error. This denoising method enhances the image SNR by an average of 12dB. Next, the denoised images are subjected to an iterative multi-resolution high-pass filtering algorithm that detects seven sub-retinal surfaces in six iterative steps. The thicknesses of each sub-retinal layer for all scans from a particular OCT stack are then compared to the manually marked groundtruth. The proposed system uses adaptive thresholds for denoising and segmenting each image and hence it is robust to disruptions in the retinal micro-structure due to DME. The proposed denoising and segmentation system has an average error of 1.2-5.8 $μm$ and 3.5-26$μm$ for segmenting sub-retinal surfaces in normal and abnormal images with DME, respectively. For estimating the sub-retinal layer thicknesses, the proposed system has an average error of 0.2-2.5 $μm$ and 1.8-18 $μm$ in normal and abnormal images, respectively. Additionally, the average inner sub-retinal layer thickness in abnormal images is estimated as 275$μm (r=0.92)$ with an average error of 9.3 $μm$, while the average thickness of the outer layers in abnormal images is estimated as 57.4$μm (r=0.74)$ with an average error of 3.5 $μm$. The proposed system can be useful for tracking the disease progression for DME over a period of time.

MED-PHAug 13, 2016
Automated Selection of Uniform Regions for CT Image Quality Detection

Maitham D Naeemi, Adam M Alessio, Sohini Roychowdhury

CT images are widely used in pathology detection and follow-up treatment procedures. Accurate identification of pathological features requires diagnostic quality CT images with minimal noise and artifact variation. In this work, a novel Fourier-transform based metric for image quality (IQ) estimation is presented that correlates to additive CT image noise. In the proposed method, two windowed CT image subset regions are analyzed together to identify the extent of variation in the corresponding Fourier-domain spectrum. The two square windows are chosen such that their center pixels coincide and one window is a subset of the other. The Fourier-domain spectral difference between these two sub-sampled windows is then used to isolate spatial regions-of-interest (ROI) with low signal variation (ROI-LV) and high signal variation (ROI-HV), respectively. Finally, the spatial variance ($var$), standard deviation ($std$), coefficient of variance ($cov$) and the fraction of abdominal ROI pixels in ROI-LV ($ν'(q)$), are analyzed with respect to CT image noise. For the phantom CT images, $var$ and $std$ correlate to CT image noise ($|r|>0.76$ ($p\ll0.001$)), though not as well as $ν'(q)$ ($r=0.96$ ($p\ll0.001$)). However, for the combined phantom and patient CT images, $var$ and $std$ do not correlate well with CT image noise ($|r|<0.46$ ($p\ll0.001$)) as compared to $ν'(q)$ ($r=0.95$ ($p\ll0.001$)). Thus, the proposed method and the metric, $ν'(q)$, can be useful to quantitatively estimate CT image noise.

CVMay 24, 2016
Blind Analysis of CT Image Noise Using Residual Denoised Images

Sohini Roychowdhury, Nathan Hollraft, Adam Alessio

CT protocol design and quality control would benefit from automated tools to estimate the quality of generated CT images. These tools could be used to identify erroneous CT acquisitions or refine protocols to achieve certain signal to noise characteristics. This paper investigates blind estimation methods to determine global signal strength and noise levels in chest CT images. Methods: We propose novel performance metrics corresponding to the accuracy of noise and signal estimation. We implement and evaluate the noise estimation performance of six spatial- and frequency- based methods, derived from conventional image filtering algorithms. Algorithms were tested on patient data sets from whole-body repeat CT acquisitions performed with a higher and lower dose technique over the same scan region. Results: The proposed performance metrics can evaluate the relative tradeoff of filter parameters and noise estimation performance. The proposed automated methods tend to underestimate CT image noise at low-flux levels. Initial application of methodology suggests that anisotropic diffusion and Wavelet-transform based filters provide optimal estimates of noise. Furthermore, methodology does not provide accurate estimates of absolute noise levels, but can provide estimates of relative change and/or trends in noise levels.

CVMar 26, 2016
Classification of Large-Scale Fundus Image Data Sets: A Cloud-Computing Framework

Sohini Roychowdhury

Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.

CVMar 26, 2016
A generalized flow for multi-class and binary classification tasks: An Azure ML approach

Matthew Bihis, Sohini Roychowdhury

The constant growth in the present day real-world databases pose computational challenges for a single computer. Cloud-based platforms, on the other hand, are capable of handling large volumes of information manipulation tasks, thereby necessitating their use for large real-world data set computations. This work focuses on creating a novel Generalized Flow within the cloud-based computing platform: Microsoft Azure Machine Learning Studio (MAMLS) that accepts multi-class and binary classification data sets alike and processes them to maximize the overall classification accuracy. First, each data set is split into training and testing data sets, respectively. Then, linear and nonlinear classification model parameters are estimated using the training data set. Data dimensionality reduction is then performed to maximize classification accuracy. For multi-class data sets, data centric information is used to further improve overall classification accuracy by reducing the multi-class classification to a series of hierarchical binary classification tasks. Finally, the performance of optimized classification model thus achieved is evaluated and scored on the testing data set. The classification characteristics of the proposed flow are comparatively evaluated on 3 public data sets and a local data set with respect to existing state-of-the-art methods. On the 3 public data sets, the proposed flow achieves 78-97.5% classification accuracy. Also, the local data set, created using the information regarding presence of Diabetic Retinopathy lesions in fundus images, results in 85.3-95.7% average classification accuracy, which is higher than the existing methods. Thus, the proposed generalized flow can be useful for a wide range of application-oriented "big data sets".

CVNov 11, 2015
Facial Expression Detection using Patch-based Eigen-face Isomap Networks

Sohini Roychowdhury

Automated facial expression detection problem pose two primary challenges that include variations in expression and facial occlusions (glasses, beard, mustache or face covers). In this paper we introduce a novel automated patch creation technique that masks a particular region of interest in the face, followed by Eigen-value decomposition of the patched faces and generation of Isomaps to detect underlying clustering patterns among faces. The proposed masked Eigen-face based Isomap clustering technique achieves 75% sensitivity and 66-73% accuracy in classification of faces with occlusions and smiling faces in around 1 second per image. Also, betweenness centrality, Eigen centrality and maximum information flow can be used as network-based measures to identify the most significant training faces for expression classification tasks. The proposed method can be used in combination with feature-based expression classification methods in large data sets for improving expression classification accuracies.

CVNov 7, 2015
A Survey of the Trends in Facial and Expression Recognition Databases and Methods

Sohini Roychowdhury, Michelle Emmons

Automated facial identification and facial expression recognition have been topics of active research over the past few decades. Facial and expression recognition find applications in human-computer interfaces, subject tracking, real-time security surveillance systems and social networking. Several holistic and geometric methods have been developed to identify faces and expressions using public and local facial image databases. In this work we present the evolution in facial image data sets and the methodologies for facial identification and recognition of expressions such as anger, sadness, happiness, disgust, fear and surprise. We observe that most of the earlier methods for facial and expression recognition aimed at improving the recognition rates for facial feature-based methods using static images. However, the recent methodologies have shifted focus towards robust implementation of facial/expression recognition from large image databases that vary with space (gathered from the internet) and time (video recordings). The evolution trends in databases and methodologies for facial and expression recognition can be useful for assessing the next-generation topics that may have applications in security systems or personal identification systems that involve "Quantitative face" assessments.