CVAug 19, 2024Code
From Darkness to Detail: Frequency-Aware SSMs for Low-Light VisionEashan Adhikarla, Kai Zhang, Gong Chen et al.
Low-light image enhancement remains a persistent challenge in computer vision, where state-of-the-art models are often hampered by hardware constraints and computational inefficiency, particularly at high resolutions. While foundational architectures like transformers and diffusion models have advanced the field, their computational complexity limits their deployment on edge devices. We introduce ExpoMamba, a novel architecture that integrates a frequency-aware state-space model within a modified U-Net. ExpoMamba is designed to address mixed-exposure challenges by decoupling the modeling of amplitude (intensity) and phase (structure) in the frequency domain. This allows for targeted enhancement, making it highly effective for real-time applications, including downstream tasks like object detection and segmentation. Our experiments on six benchmark datasets show that ExpoMamba is up to 2-3x faster than competing models and achieves a 6.8\% PSNR improvement, establishing a new state-of-the-art in efficient, high-quality low-light enhancement. Source code: https://www.github.com/eashanadhikarla/ExpoMamba.
LGApr 29, 2024Code
Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model ErasJun Yu, Yutong Dai, Xiaokang Liu et al.
MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the past twenty years, MTL has become widely recognized as a flexible and effective approach in various fields, including CV, NLP, recommendation systems, disease prognosis and diagnosis, and robotics. This survey provides a comprehensive overview of the evolution of MTL, encompassing the technical aspects of cutting-edge methods from traditional approaches to deep learning and the latest trend of pretrained foundation models. Our survey methodically categorizes MTL techniques into five key areas: regularization, relationship learning, feature propagation, optimization, and pre-training. This categorization not only chronologically outlines the development of MTL but also dives into various specialized strategies within each category. Furthermore, the survey reveals how the MTL evolves from handling a fixed set of tasks to embracing a more flexible approach free from task or modality constraints. It explores the concepts of task-promptable and -agnostic training, along with the capacity for ZSL, which unleashes the untapped potential of this historically coveted learning paradigm. Overall, we hope this survey provides the research community with a comprehensive overview of the advancements in MTL from its inception in 1997 to the present in 2023. We address present challenges and look ahead to future possibilities, shedding light on the opportunities and potential avenues for MTL research in a broad manner. This project is publicly available at https://github.com/junfish/Awesome-Multitask-Learning.
CVJul 18, 2024
Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image EnhancementEashan Adhikarla, Kai Zhang, Rosaura G. VidalMata et al.
Despite recent strides made by AI in image processing, the issue of mixed exposure, pivotal in many real-world scenarios like surveillance and photography, remains inadequately addressed. Traditional image enhancement techniques and current transformer models are limited with primary focus on either overexposure or underexposure. To bridge this gap, we introduce the Unified-Exposure Guided Transformer (Unified-EGformer). Our proposed solution is built upon advanced transformer architectures, equipped with local pixel-level refinement and global refinement blocks for color correction and image-wide adjustments. We employ a guided attention mechanism to precisely identify exposure-compromised regions, ensuring its adaptability across various real-world conditions. U-EGformer, with a lightweight design featuring a memory footprint (peak memory) of only $\sim$1134 MB (0.1 Million parameters) and an inference time of 95 ms (9.61x faster than the average), is a viable choice for real-time applications such as surveillance and autonomous navigation. Additionally, our model is highly generalizable, requiring minimal fine-tuning to handle multiple tasks and datasets with a single architecture.
CVOct 24, 2023
Deep Feature Registration for Unsupervised Domain AdaptationYoushan Zhang, Brian D. Davison
While unsupervised domain adaptation has been explored to leverage the knowledge from a labeled source domain to an unlabeled target domain, existing methods focus on the distribution alignment between two domains. However, how to better align source and target features is not well addressed. In this paper, we propose a deep feature registration (DFR) model to generate registered features that maintain domain invariant features and simultaneously minimize the domain-dissimilarity of registered features and target features via histogram matching. We further employ a pseudo label refinement process, which considers both probabilistic soft selection and center-based hard selection to improve the quality of pseudo labels in the target domain. Extensive experiments on multiple UDA benchmarks demonstrate the effectiveness of our DFR model, resulting in new state-of-the-art performance.
CVJan 25Code
RemEdit: Efficient Diffusion Editing with Riemannian GeometryEashan Adhikarla, Brian D. Davison
Controllable image generation is fundamental to the success of modern generative AI, yet it faces a critical trade-off between semantic fidelity and inference speed. The RemEdit diffusion-based framework addresses this trade-off with two synergistic innovations. First, for editing fidelity, we navigate the latent space as a Riemannian manifold. A mamba-based module efficiently learns the manifold's structure, enabling direct and accurate geodesic path computation for smooth semantic edits. This control is further refined by a dual-SLERP blending technique and a goal-aware prompt enrichment pass from a Vision-Language Model. Second, for additional acceleration, we introduce a novel task-specific attention pruning mechanism. A lightweight pruning head learns to retain tokens essential to the edit, enabling effective optimization without the semantic degradation common in content-agnostic approaches. RemEdit surpasses prior state-of-the-art editing frameworks while maintaining real-time performance under 50% pruning. Consequently, RemEdit establishes a new benchmark for practical and powerful image editing. Source code: https://www.github.com/eashanadhikarla/RemEdit.
CLMay 26, 2023Code
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical TasksKai Zhang, Rong Zhou, Eashan Adhikarla et al.
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners, and patients. Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model, designed as a generalist capable of performing various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation, and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts. Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency.
CVFeb 5, 2022Code
Memory Defense: More Robust Classification via a Memory-Masking AutoencoderEashan Adhikarla, Dan Luo, Brian D. Davison
Many deep neural networks are susceptible to minute perturbations of images that have been carefully crafted to cause misclassification. Ideally, a robust classifier would be immune to small variations in input images, and a number of defensive approaches have been created as a result. One method would be to discern a latent representation which could ignore small changes to the input. However, typical autoencoders easily mingle inter-class latent representations when there are strong similarities between classes, making it harder for a decoder to accurately project the image back to the original high-dimensional space. We propose a novel framework, Memory Defense, an augmented classifier with a memory-masking autoencoder to counter this challenge. By masking other classes, the autoencoder learns class-specific independent latent representations. We test the model's robustness against four widely used attacks. Experiments on the Fashion-MNIST & CIFAR-10 datasets demonstrate the superiority of our model. We make available our source code at GitHub repository: https://github.com/eashanadhikarla/MemDefense
LGJul 5, 2020Code
Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text ClassificationHui Ye, Zhiyu Chen, Da-Han Wang et al.
Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input text. We propose Adaptive Probabilistic Label Clusters (APLC) to approximate the cross entropy loss by exploiting the unbalanced label distribution to form clusters that explicitly reduce the computational time. Our experiments, carried out on five benchmark datasets, show that our approach has achieved new state-of-the-art results on four benchmark datasets. Our source code is available publicly at https://github.com/huiyegit/APLC_XLNet.
NINov 10, 2018Code
IP Geolocation through Reverse DNSOvidiu Dan, Vaibhav Parikh, Brian D. Davison
IP Geolocation databases are widely used in online services to map end user IP addresses to their geographical locations. However, they use proprietary geolocation methods and in some cases they have poor accuracy. We propose a systematic approach to use publicly accessible reverse DNS hostnames for geolocating IP addresses. Our method is designed to be combined with other geolocation data sources. We cast the task as a machine learning problem where for a given hostname, we generate and rank a list of potential location candidates. We evaluate our approach against three state of the art academic baselines and two state of the art commercial IP geolocation databases. We show that our work significantly outperforms the academic baselines, and is complementary and competitive with commercial databases. To aid reproducibility, we open source our entire approach.
CVOct 7, 2025
Diffusion Models for Low-Light Image Enhancement: A Multi-Perspective Taxonomy and Performance AnalysisEashan Adhikarla, Yixin Liu, Brian D. Davison
Low-light image enhancement (LLIE) is vital for safety-critical applications such as surveillance, autonomous navigation, and medical imaging, where visibility degradation can impair downstream task performance. Recently, diffusion models have emerged as a promising generative paradigm for LLIE due to their capacity to model complex image distributions via iterative denoising. This survey provides an up-to-date critical analysis of diffusion models for LLIE, distinctively featuring an in-depth comparative performance evaluation against Generative Adversarial Network and Transformer-based state-of-the-art methods, a thorough examination of practical deployment challenges, and a forward-looking perspective on the role of emerging paradigms like foundation models. We propose a multi-perspective taxonomy encompassing six categories: Intrinsic Decomposition, Spectral & Latent, Accelerated, Guided, Multimodal, and Autonomous; that map enhancement methods across physical priors, conditioning schemes, and computational efficiency. Our taxonomy is grounded in a hybrid view of both the model mechanism and the conditioning signals. We evaluate qualitative failure modes, benchmark inconsistencies, and trade-offs between interpretability, generalization, and inference efficiency. We also discuss real-world deployment constraints (e.g., memory, energy use) and ethical considerations. This survey aims to guide the next generation of diffusion-based LLIE research by highlighting trends and surfacing open research questions, including novel conditioning, real-time adaptation, and the potential of foundation models.
LGNov 3, 2021
Deep Least Squares Alignment for Unsupervised Domain AdaptationYoushan Zhang, Brian D. Davison
Unsupervised domain adaptation leverages rich information from a labeled source domain to model an unlabeled target domain. Existing methods attempt to align the cross-domain distributions. However, the statistical representations of the alignment of the two domains are not well addressed. In this paper, we propose deep least squares alignment (DLSA) to estimate the distribution of the two domains in a latent space by parameterizing a linear model. We further develop marginal and conditional adaptation loss to reduce the domain discrepancy by minimizing the angle between fitting lines and intercept differences and further learning domain invariant features. Extensive experiments demonstrate that the proposed DLSA model is effective in aligning domain distributions and outperforms state-of-the-art methods.
CVJun 22, 2021
Automatic Head Overcoat Thickness Measure with NASNet-Large-Decoder NetYoushan Zhang, Brian D. Davison, Vivien W. Talghader et al.
Transmission electron microscopy (TEM) is one of the primary tools to show microstructural characterization of materials as well as film thickness. However, manual determination of film thickness from TEM images is time-consuming as well as subjective, especially when the films in question are very thin and the need for measurement precision is very high. Such is the case for head overcoat (HOC) thickness measurements in the magnetic hard disk drive industry. It is therefore necessary to develop software to automatically measure HOC thickness. In this paper, for the first time, we propose a HOC layer segmentation method using NASNet-Large as an encoder and then followed by a decoder architecture, which is one of the most commonly used architectures in deep learning for image segmentation. To further improve segmentation results, we are the first to propose a post-processing layer to remove irrelevant portions in the segmentation result. To measure the thickness of the segmented HOC layer, we propose a regressive convolutional neural network (RCNN) model as well as orthogonal thickness calculation methods. Experimental results demonstrate a higher dice score for our model which has lower mean squared error and outperforms current state-of-the-art manual measurement.
CVJun 22, 2021
Enhanced Separable Disentanglement for Unsupervised Domain AdaptationYoushan Zhang, Brian D. Davison
Domain adaptation aims to mitigate the domain gap when transferring knowledge from an existing labeled domain to a new domain. However, existing disentanglement-based methods do not fully consider separation between domain-invariant and domain-specific features, which means the domain-invariant features are not discriminative. The reconstructed features are also not sufficiently used during training. In this paper, we propose a novel enhanced separable disentanglement (ESD) model. We first employ a disentangler to distill domain-invariant and domain-specific features. Then, we apply feature separation enhancement processes to minimize contamination between domain-invariant and domain-specific features. Finally, our model reconstructs complete feature vectors, which are used for further disentanglement during the training phase. Extensive experiments from three benchmark datasets outperform state-of-the-art methods, especially on challenging cross-domain tasks.
CVMay 18, 2021
Correlated Adversarial Joint Discrepancy Adaptation NetworkYoushan Zhang, Brian D. Davison
Domain adaptation aims to mitigate the domain shift problem when transferring knowledge from one domain into another similar but different domain. However, most existing works rely on extracting marginal features without considering class labels. Moreover, some methods name their model as so-called unsupervised domain adaptation while tuning the parameters using the target domain label. To address these issues, we propose a novel approach called correlated adversarial joint discrepancy adaptation network (CAJNet), which minimizes the joint discrepancy of two domains and achieves competitive performance with tuning parameters using the correlated label. By training the joint features, we can align the marginal and conditional distributions between the two domains. In addition, we introduce a probability-based top-$\mathcal{K}$ correlated label ($\mathcal{K}$-label), which is a powerful indicator of the target domain and effective metric to tune parameters to aid predictions. Extensive experiments on benchmark datasets demonstrate significant improvements in classification accuracy over the state of the art.
IRMay 5, 2021
WTR: A Test Collection for Web Table RetrievalZhiyu Chen, Shuo Zhang, Brian D. Davison
We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods.
CVMay 5, 2021
Deep Spherical Manifold Gaussian Kernel for Unsupervised Domain AdaptationYoushan Zhang, Brian D. Davison
Unsupervised Domain adaptation is an effective method in addressing the domain shift issue when transferring knowledge from an existing richly labeled domain to a new domain. Existing manifold-based methods either are based on traditional models or largely rely on Grassmannian manifold via minimizing differences of single covariance matrices of two domains. In addition, existing pseudo-labeling algorithms inadequately consider the quality of pseudo labels in aligning the conditional distribution between two domains. In this work, a deep spherical manifold Gaussian kernel (DSGK) framework is proposed to map the source and target subspaces into a spherical manifold and reduce the discrepancy between them by embedding both extracted features and a Gaussian kernel. To align the conditional distributions, we further develop an easy-to-hard pseudo label refinement process to improve the quality of the pseudo labels and then reduce categorical spherical manifold Gaussian kernel geodesic loss. Extensive experimental results show that DSGK outperforms state-of-the-art methods, especially on challenging cross-domain learning tasks.
CVApr 27, 2021
Efficient Pre-trained Features and Recurrent Pseudo-Labeling in Unsupervised Domain AdaptationYoushan Zhang, Brian D. Davison
Domain adaptation (DA) mitigates the domain shift problem when transferring knowledge from one annotated domain to another similar but different unlabeled domain. However, existing models often utilize one of the ImageNet models as the backbone without exploring others, and fine-tuning or retraining the backbone ImageNet model is also time-consuming. Moreover, pseudo-labeling has been used to improve the performance in the target domain, while how to generate confident pseudo labels and explicitly align domain distributions has not been well addressed. In this paper, we show how to efficiently opt for the best pre-trained features from seventeen well-known ImageNet models in unsupervised DA problems. In addition, we propose a recurrent pseudo-labeling model using the best pre-trained features (termed PRPL) to improve classification performance. To show the effectiveness of PRPL, we evaluate it on three benchmark datasets, Office+Caltech-10, Office-31, and Office-Home. Extensive experiments show that our model reduces computation time and boosts the mean accuracy to 98.1%, 92.4%, and 81.2%, respectively, substantially outperforming the state of the art.
CVMar 10, 2021
Adversarial Regression Learning for Bone Age EstimationYoushan Zhang, Brian D. Davison
Estimation of bone age from hand radiographs is essential to determine skeletal age in diagnosing endocrine disorders and depicting the growth status of children. However, existing automatic methods only apply their models to test images without considering the discrepancy between training samples and test samples, which will lead to a lower generalization ability. In this paper, we propose an adversarial regression learning network (ARLNet) for bone age estimation. Specifically, we first extract bone features from a fine-tuned Inception V3 neural network and propose regression percentage loss for training. To reduce the discrepancy between training and test data, we then propose adversarial regression loss and feature reconstruction loss to guarantee the transition from training data to test data and vice versa, preserving invariant features from both training and test data. Experimental results show that the proposed model outperforms state-of-the-art methods.
IRFeb 23, 2021
Neural ranking models for document retrievalMohamed Trabelsi, Zhiyu Chen, Brian D. Davison et al.
Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.
CVSep 19, 2020
Adversarial Consistent Learning on Partial Domain Adaptation of PlantCLEF 2020 ChallengeYoushan Zhang, Brian D. Davison
Domain adaptation is one of the most crucial techniques to mitigate the domain shift problem, which exists when transferring knowledge from an abundant labeled sourced domain to a target domain with few or no labels. Partial domain adaptation addresses the scenario when target categories are only a subset of source categories. In this paper, to enable the efficient representation of cross-domain plant images, we first extract deep features from pre-trained models and then develop adversarial consistent learning ($ACL$) in a unified deep architecture for partial domain adaptation. It consists of source domain classification loss, adversarial learning loss, and feature consistency loss. Adversarial learning loss can maintain domain-invariant features between the source and target domains. Moreover, feature consistency loss can preserve the fine-grained feature transition between two domains. We also find the shared categories of two domains via down-weighting the irrelevant categories in the source domain. Experimental results demonstrate that training features from NASNetLarge model with proposed $ACL$ architecture yields promising results on the PlantCLEF 2020 Challenge.
IRMay 19, 2020
Table Search Using a Deep Contextualized Language ModelZhiyu Chen, Mohamed Trabelsi, Jeff Heflin et al.
Pretrained contextualized language models such as BERT have achieved impressive results on various natural language processing benchmarks. Benefiting from multiple pretraining tasks and large scale training corpora, pretrained models can capture complex syntactic word relations. In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval. We investigate how to encode table content considering the table structure and input length limit of BERT. We also propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT. In experiments on public datasets, we show that our best approach can outperform the previous state-of-the-art method and BERT baselines with a large margin under different evaluation metrics.
CVFeb 6, 2020
Impact of ImageNet Model Selection on Domain AdaptationYoushan Zhang, Brian D. Davison
Deep neural networks are widely used in image classification problems. However, little work addresses how features from different deep neural networks affect the domain adaptation problem. Existing methods often extract deep features from one ImageNet model, without exploring other neural networks. In this paper, we investigate how different ImageNet models affect transfer accuracy on domain adaptation problems. We extract features from sixteen distinct pre-trained ImageNet models and examine the performance of twelve benchmarking methods when using the features. Extensive experimental results show that a higher accuracy ImageNet model produces better features, and leads to higher accuracy on domain adaptation problems (with a correlation coefficient of up to 0.95). We also examine the architecture of each neural network to find the best layer for feature extraction. Together, performance from our features exceeds that of the state-of-the-art in three benchmark datasets.
IRJan 27, 2020
Leveraging Schema Labels to Enhance Dataset SearchZhiyu Chen, Haiyan Jia, Jeff Heflin et al.
A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text.We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only considers the relevance between the query and dataset metadata but also the similarity between the query and generated schema labels. To evaluate our method on real-world datasets, we create a new benchmark specifically for the dataset retrieval task. Experiments show that our approach can effectively improve the precision and NDCG scores of the dataset retrieval task compared with baseline methods. We also test on a collection of Wikipedia tables to show that the features generated from schema labels can improve the unsupervised and supervised web table retrieval task as well.
CVApr 4, 2019
Modified Distribution Alignment for Domain Adaptation with Pre-trained Inception ResNetYoushan Zhang, Brian D. Davison
Deep neural networks have been widely used in computer vision. There are several well trained deep neural networks for the ImageNet classification challenge, which has played a significant role in image recognition. However, little work has explored pre-trained neural networks for image recognition in domain adaption. In this paper, we are the first to extract better-represented features from a pre-trained Inception ResNet model for domain adaptation. We then present a modified distribution alignment method for classification using the extracted features. We test our model using three benchmark datasets (Office+Caltech-10, Office-31, and Office-Home). Extensive experiments demonstrate significant improvements (4.8%, 5.5%, and 10%) in classification accuracy over the state-of-the-art.