LGApr 16, 2019Code
ASD-DiagNet: A hybrid learning approach for detection of Autism Spectrum Disorder using fMRI dataTaban Eslami, Vahid Mirjalili, Alvis Fong et al.
Mental disorders such as Autism Spectrum Disorders (ASD) are heterogeneous disorders that are notoriously difficult to diagnose, especially in children. The current psychiatric diagnostic process is based purely on the behavioural observation of symptomology (DSM-5/ICD-10) and may be prone to over-prescribing of drugs due to misdiagnosis. In order to move the field towards more quantitative fashion, we need advanced and scalable machine learning infrastructure that will allow us to identify reliable biomarkers of mental health disorders. In this paper, we propose a framework called ASD-DiagNet for classifying subjects with ASD from healthy subjects by using only fMRI data. We designed and implemented a joint learning procedure using an autoencoder and a single layer perceptron which results in improved quality of extracted features and optimized parameters for the model. Further, we designed and implemented a data augmentation strategy, based on linear interpolation on available feature vectors, that allows us to produce synthetic datasets needed for training of machine learning models. The proposed approach is evaluated on a public dataset provided by Autism Brain Imaging Data Exchange including 1035 subjects coming from 17 different brain imaging centers. Our machine learning model outperforms other state of the art methods from 13 imaging centers with increase in classification accuracy up to 20% with maximum accuracy of 80%. The machine learning technique presented in this paper, in addition to yielding better quality, gives enormous advantages in terms of execution time (40 minutes vs. 6 hours on other methods). The implemented code is available as GPL license on GitHub portal of our lab (https://github.com/pcdslab/ASD-DiagNet).
CVSep 26, 2025
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial UnderstandingVahid Mirjalili, Ramin Giahi, Sriram Kollipara et al.
Spatial understanding is a critical capability for vision foundation models. While recent advances in large vision models or vision-language models (VLMs) have expanded recognition capabilities, most benchmarks emphasize localization accuracy rather than whether models capture how objects are arranged and related within a scene. This gap is consequential; effective scene understanding requires not only identifying objects, but reasoning about their relative positions, groupings, and depth. In this paper, we present a systematic benchmark for object-centric spatial reasoning in foundation models. Using a controlled synthetic dataset, we evaluate state-of-the-art vision models (e.g., GroundingDINO, Florence-2, OWLv2) and large VLMs (e.g., InternVL, LLaVA, GPT-4o) across three tasks: spatial localization, spatial reasoning, and downstream retrieval tasks. We find a stable trade-off: detectors such as GroundingDINO and OWLv2 deliver precise boxes with limited relational reasoning, while VLMs like SmolVLM and GPT-4o provide coarse layout cues and fluent captions but struggle with fine-grained spatial context. Our study highlights the gap between localization and true spatial understanding, and pointing toward the need for spatially-aware foundation models in the community.
IRJul 22, 2025
VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP EmbeddingsRamin Giahi, Kehui Yao, Sriram Kollipara et al.
Multimodal learning plays a critical role in e-commerce recommendation platforms today, enabling accurate recommendations and product understanding. However, existing vision-language models, such as CLIP, face key challenges in e-commerce recommendation systems: 1) Weak object-level alignment, where global image embeddings fail to capture fine-grained product attributes, leading to suboptimal retrieval performance; 2) Ambiguous textual representations, where product descriptions often lack contextual clarity, affecting cross-modal matching; and 3) Domain mismatch, as generic vision-language models may not generalize well to e-commerce-specific data. To address these limitations, we propose a framework, VL-CLIP, that enhances CLIP embeddings by integrating Visual Grounding for fine-grained visual understanding and an LLM-based agent for generating enriched text embeddings. Visual Grounding refines image representations by localizing key products, while the LLM agent enhances textual features by disambiguating product descriptions. Our approach significantly improves retrieval accuracy, multimodal retrieval effectiveness, and recommendation quality across tens of millions of items on one of the largest e-commerce platforms in the U.S., increasing CTR by 18.6%, ATC by 15.5%, and GMV by 4.0%. Additional experimental results show that our framework outperforms vision-language models, including CLIP, FashionCLIP, and GCL, in both precision and semantic alignment, demonstrating the potential of combining object-aware visual grounding and LLM-enhanced text representation for robust multimodal recommendations.
CVMar 28, 2025
GmNet: Revisiting Gating Mechanisms From A Frequency ViewYifan Wang, Xu Ma, Yitian Zhang et al.
Gating mechanisms have emerged as an effective strategy integrated into model designs beyond recurrent neural networks for addressing long-range dependency problems. In a broad understanding, it provides adaptive control over the information flow while maintaining computational efficiency. However, there is a lack of theoretical analysis on how the gating mechanism works in neural networks. In this paper, inspired by the \textit{convolution theorem}, we systematically explore the effect of gating mechanisms on the training dynamics of neural networks from a frequency perspective. We investigate the interact between the element-wise product and activation functions in managing the responses to different frequency components. Leveraging these insights, we propose a Gating Mechanism Network (GmNet), a lightweight model designed to efficiently utilize the information of various frequency components. It minimizes the low-frequency bias present in existing lightweight models. GmNet achieves impressive performance in terms of both effectiveness and efficiency in the image classification task.
CVJan 2, 2020
PrivacyNet: Semi-Adversarial Networks for Multi-attribute Face PrivacyVahid Mirjalili, Sebastian Raschka, Arun Ross
Recent research has established the possibility of deducing soft-biometric attributes such as age, gender and race from an individual's face image with high accuracy. However, this raises privacy concerns, especially when face images collected for biometric recognition purposes are used for attribute analysis without the person's consent. To address this problem, we develop a technique for imparting soft biometric privacy to face images via an image perturbation methodology. The image perturbation is undertaken using a GAN-based Semi-Adversarial Network (SAN) - referred to as PrivacyNet - that modifies an input face image such that it can be used by a face matcher for matching purposes but cannot be reliably used by an attribute classifier. Further, PrivacyNet allows a person to choose specific attributes that have to be obfuscated in the input face images (e.g., age and race), while allowing for other types of attributes to be extracted (e.g., gender). Extensive experiments using multiple face matchers, multiple age/gender/race classifiers, and multiple face datasets demonstrate the generalizability of the proposed multi-attribute privacy enhancing method across multiple face and attribute classifiers.
CVMay 12, 2019
Some Research Problems in Biometrics: The Future BeckonsArun Ross, Sudipta Banerjee, Cunjian Chen et al.
The need for reliably determining the identity of a person is critical in a number of different domains ranging from personal smartphones to border security; from autonomous vehicles to e-voting; from tracking child vaccinations to preventing human trafficking; from crime scene investigation to personalization of customer service. Biometrics, which entails the use of biological attributes such as face, fingerprints and voice for recognizing a person, is being increasingly used in several such applications. While biometric technology has made rapid strides over the past decade, there are several fundamental issues that are yet to be satisfactorily resolved. In this article, we will discuss some of these issues and enumerate some of the exciting challenges in this field.
CVMay 3, 2019
FlowSAN: Privacy-enhancing Semi-Adversarial Networks to Confound Arbitrary Face-based Gender ClassifiersVahid Mirjalili, Sebastian Raschka, Arun Ross
Privacy concerns in the modern digital age have prompted researchers to develop techniques that allow users to selectively suppress certain information in collected data while allowing for other information to be extracted. In this regard, Semi-Adversarial Networks (SAN) have recently emerged as a method for imparting soft-biometric privacy to face images. SAN enables modifications of input face images so that the resulting face images can still be reliably used by arbitrary conventional face matchers for recognition purposes, while attribute classifiers, such as gender classifiers, are confounded. However, the generalizability of SANs across arbitrary gender classifiers has remained an open concern. In this work, we propose a new method, FlowSAN, for allowing SANs to generalize to multiple unseen gender classifiers. We propose combining a diverse set of SAN models to compensate each other's weaknesses, thereby, forming a robust model with improved generalization capability. Extensive experiments using different unseen gender classifiers and face matchers demonstrate the efficacy of the proposed paradigm in imparting gender privacy to face images.
LGJan 20, 2019
Rank consistent ordinal regression for neural networks with application to age estimationWenzhi Cao, Vahid Mirjalili, Sebastian Raschka
In many real-world prediction tasks, class labels include information about the relative ordering between labels, which is not captured by commonly-used loss functions such as multi-category cross-entropy. Recently, the deep learning community adopted ordinal regression frameworks to take such ordering information into account. Neural networks were equipped with ordinal regression capabilities by transforming ordinal targets into binary classification subtasks. However, this method suffers from inconsistencies among the different binary classifiers. To resolve these inconsistencies, we propose the COnsistent RAnk Logits (CORAL) framework with strong theoretical guarantees for rank-monotonicity and consistent confidence scores. Moreover, the proposed method is architecture-agnostic and can extend arbitrary state-of-the-art deep neural network classifiers for ordinal regression tasks. The empirical evaluation of the proposed rank-consistent method on a range of face-image datasets for age prediction shows a substantial reduction of the prediction error compared to the reference ordinal regression network.
CVAug 31, 2018
Spoofing PRNU Patterns of Iris Sensors while Preserving Iris RecognitionSudipta Banerjee, Vahid Mirjalili, Arun Ross
The principle of Photo Response Non-Uniformity (PRNU) is used to link an image with its source, i.e., the sensor that produced it. In this work, we investigate if it is possible to modify an iris image acquired using one sensor in order to spoof the PRNU noise pattern of a different sensor. In this regard, we develop an image perturbation routine that iteratively modifies blocks of pixels in the original iris image such that its PRNU pattern approaches that of a target sensor. Experiments indicate the efficacy of the proposed perturbation method in spoofing PRNU patterns present in an iris image whilst still retaining its biometric content.
CVJul 31, 2018
Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender ClassifiersVahid Mirjalili, Sebastian Raschka, Arun Ross
Recent research has proposed the use of Semi Adversarial Networks (SAN) for imparting privacy to face images. SANs are convolutional autoencoders that perturb face images such that the perturbed images cannot be reliably used by an attribute classifier (e.g., a gender classifier) but can still be used by a face matcher for matching purposes. However, the generalizability of SANs across multiple arbitrary gender classifiers has not been demonstrated in the literature. In this work, we tackle the generalization issue by designing an ensemble SAN model that generates a diverse set of perturbed outputs for a given input face image. This is accomplished by enforcing diversity among the individual models in the ensemble through the use of different data augmentation techniques. The goal is to ensure that at least one of the perturbed output faces will confound an arbitrary, previously unseen gender classifier. Extensive experiments using different unseen gender classifiers and face matchers are performed to demonstrate the efficacy of the proposed paradigm in imparting gender privacy to face images.
CVDec 1, 2017
Semi-Adversarial Networks: Convolutional Autoencoders for Imparting Privacy to Face ImagesVahid Mirjalili, Sebastian Raschka, Anoop Namboodiri et al.
In this paper, we design and evaluate a convolutional autoencoder that perturbs an input face image to impart privacy to a subject. Specifically, the proposed autoencoder transforms an input face image such that the transformed image can be successfully used for face recognition but not for gender classification. In order to train this autoencoder, we propose a novel training scheme, referred to as semi-adversarial training in this work. The training is facilitated by attaching a semi-adversarial module consisting of a pseudo gender classifier and a pseudo face matcher to the autoencoder. The objective function utilized for training this network has three terms: one to ensure that the perturbed image is a realistic face image; another to ensure that the gender attributes of the face are confounded; and a third to ensure that biometric recognition performance due to the perturbed image is not impacted. Extensive experiments confirm the efficacy of the proposed architecture in extending gender privacy to face images.