Dashan Gao

CV
h-index5
22papers
892citations
Novelty45%
AI Score49

22 Papers

CVMar 22, 2023
Dense Network Expansion for Class Incremental Learning

Zhiyuan Hu, Yunsheng Li, Jiancheng Lyu et al.

The problem of class incremental learning (CIL) is considered. State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task. While effective from a computational standpoint, these methods lead to models that grow quickly with the number of tasks. A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. This is accomplished by the introduction of dense connections between the intermediate layers of the task expert networks, that enable the transfer of knowledge from old to new tasks via feature sharing and reusing. This sharing is implemented with a cross-task attention mechanism, based on a new task attention block (TAB), that fuses information across tasks. Unlike traditional attention mechanisms, TAB operates at the level of the feature mixing and is decoupled with spatial attentions. This is shown more effective than a joint spatial-and-task attention for CIL. The proposed DNE approach can strictly maintain the feature space of old classes while growing the network and feature scale at a much slower rate than previous methods. In result, it outperforms the previous SOTA methods by a margin of 4\% in terms of accuracy, with similar or even smaller model scale.

CVJun 14, 2023
POP: Prompt Of Prompts for Continual Learning

Zhiyuan Hu, Jiancheng Lyu, Dashan Gao et al.

Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datasets, provide an interesting substrate for the solution of the CL problem. Recent work has also shown that they can be adapted to specific tasks by prompt tuning techniques that leave the generality of the representation mostly unscathed. An open question is, however, how to learn both prompts that are task specific and prompts that are global, i.e. capture cross-task information. In this work, we propose the Prompt Of Prompts (POP) model, which addresses this goal by progressively learning a group of task-specified prompts and a group of global prompts, denoted as POP, to integrate information from the former. We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin. Moreover, as prompt tuning only requires a small set of training samples, POP is able to perform CL in the few-shot setting, while still outperforming competing methods trained on the entire dataset.

CVMar 30, 2023
MobileInst: Video Instance Segmentation on the Mobile

Renhong Zhang, Tianheng Cheng, Shusheng Yang et al.

Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

LGOct 10, 2022
A Survey on Heterogeneous Federated Learning

Dashan Gao, Xin Yao, Qiang Yang

Federated learning (FL) has been proposed to protect data privacy and virtually assemble the isolated data silos by cooperatively training models among organizations without breaching privacy and security. However, FL faces heterogeneity from various aspects, including data space, statistical, and system heterogeneity. For example, collaborative organizations without conflict of interest often come from different areas and have heterogeneous data from different feature spaces. Participants may also want to train heterogeneous personalized local models due to non-IID and imbalanced data distribution and various resource-constrained devices. Therefore, heterogeneous FL is proposed to address the problem of heterogeneity in FL. In this survey, we comprehensively investigate the domain of heterogeneous FL in terms of data space, statistical, system, and model heterogeneity. We first give an overview of FL, including its definition and categorization. Then, We propose a precise taxonomy of heterogeneous FL settings for each type of heterogeneity according to the problem setting and learning objective. We also investigate the transfer learning methodologies to tackle the heterogeneity in FL. We further present the applications of heterogeneous FL. Finally, we highlight the challenges and opportunities and envision promising future research directions toward new framework design and trustworthy approaches.

84.2CVMar 24
ForeSea: AI Forensic Search with Multi-modal Queries for Video Surveillance

Hyojin Park, Yi Li, Janghoon Cho et al.

Despite decades of work, surveillance still struggles to find specific targets across long, multi-camera video. Prior methods -- tracking pipelines, CLIP based models, and VideoRAG -- require heavy manual filtering, capture only shallow attributes, and fail at temporal reasoning. Real-world searches are inherently multimodal (e.g., "When does this person join the fight?" with the person's image), yet this setting remains underexplored. Also, there are no proper benchmarks to evaluate those setting - asking video with multimodal queries. To address this gap, we introduce ForeSeaQA, a new benchmark specifically designed for video QA with image-and-text queries and timestamped annotations of key events. The dataset consists of long-horizon surveillance footage paired with diverse multimodal questions, enabling systematic evaluation of retrieval, temporal grounding, and multimodal reasoning in realistic forensic conditions. Not limited to this benchmark, we propose ForeSea, an AI forensic search system with a 3-stage, plug-and-play pipeline. (1) A tracking module filters irrelevant footage; (2) a multimodal embedding module indexes the remaining clips; and (3) during inference, the system retrieves top-K candidate clips for a Video Large Language Model (VideoLLM) to answer queries and localize events. On ForeSeaQA, ForeSea improves accuracy by 3.5% and temporal IoU by 11.0 over prior VideoRAG models. To our knowledge, ForeSeaQA is the first benchmark to support complex multimodal queries with precise temporal grounding, and ForeSea is the first VideoRAG system built to excel in this setting.

CVJun 13, 2025Code
EgoPrivacy: What Your First-Person Camera Says About You?

Yijiang Li, Genpei Zhang, Jiacheng Cheng et al.

While the rapid proliferation of wearable cameras has raised significant concerns about egocentric video privacy, prior work has largely overlooked the unique privacy threats posed to the camera wearer. This work investigates the core question: How much privacy information about the camera wearer can be inferred from their first-person view videos? We introduce EgoPrivacy, the first large-scale benchmark for the comprehensive evaluation of privacy risks in egocentric vision. EgoPrivacy covers three types of privacy (demographic, individual, and situational), defining seven tasks that aim to recover private information ranging from fine-grained (e.g., wearer's identity) to coarse-grained (e.g., age group). To further emphasize the privacy threats inherent to egocentric vision, we propose Retrieval-Augmented Attack, a novel attack strategy that leverages ego-to-exo retrieval from an external pool of exocentric videos to boost the effectiveness of demographic privacy attacks. An extensive comparison of the different attacks possible under all threat models is presented, showing that private information of the wearer is highly susceptible to leakage. For instance, our findings indicate that foundation models can effectively compromise wearer privacy even in zero-shot settings by recovering attributes such as identity, scene, gender, and race with 70-80% accuracy. Our code and data are available at https://github.com/williamium3000/ego-privacy.

LGApr 8, 2023
Robust Deep Learning Models Against Semantic-Preserving Adversarial Attack

Dashan Gao, Yunce Zhao, Yinghua Yao et al.

Deep learning models can be fooled by small $l_p$-norm adversarial perturbations and natural perturbations in terms of attributes. Although the robustness against each perturbation has been explored, it remains a challenge to address the robustness against joint perturbations effectively. In this paper, we study the robustness of deep learning models against joint perturbations by proposing a novel attack mechanism named Semantic-Preserving Adversarial (SPA) attack, which can then be used to enhance adversarial training. Specifically, we introduce an attribute manipulator to generate natural and human-comprehensible perturbations and a noise generator to generate diverse adversarial noises. Based on such combined noises, we optimize both the attribute value and the diversity variable to generate jointly-perturbed samples. For robust training, we adversarially train the deep learning model against the generated joint perturbations. Empirical results on four benchmarks show that the SPA attack causes a larger performance decline with small $l_{\infty}$ norm-ball constraints compared to existing approaches. Furthermore, our SPA-enhanced training outperforms existing defense methods against such joint perturbations.

95.2ROMay 4
Latent Bridge: Feature Delta Prediction for Efficient Dual-System Vision-Language-Action Model Inference

Yudong Liu, Yuan Li, Zijia Tang et al.

Dual-system Vision-Language-Action (VLA) models achieve state-of-the-art robotic manipulation but are bottlenecked by the VLM backbone, which must execute at every control step while producing temporally redundant features. We propose Latent Bridge, a lightweight model that predicts VLM output deltas between timesteps, enabling the action head to operate on predicted outputs while the expensive VLM backbone is called only periodically. We instantiate Latent Bridge on two architecturally distinct VLAs: GR00T-N1.6 (feature-space bridge) and π0.5 (KV-cache bridge), demonstrating that the approach generalizes across VLA designs. Our task-agnostic DAgger training pipeline transfers across benchmarks without modification. Across four LIBERO suites, 24 RoboCasa kitchen tasks, and the ALOHA sim transfer-cube task, Latent Bridge achieves 95-100% performance retention while reducing VLM calls by 50-75%, yielding 1.65-1.73x net per-episode speedup.

IRMay 9, 2023
FedPDD: A Privacy-preserving Double Distillation Framework for Cross-silo Federated Recommendation

Sheng Wan, Dashan Gao, Hanlin Gu et al.

Cross-platform recommendation aims to improve recommendation accuracy by gathering heterogeneous features from different platforms. However, such cross-silo collaborations between platforms are restricted by increasingly stringent privacy protection regulations, thus data cannot be aggregated for training. Federated learning (FL) is a practical solution to deal with the data silo problem in recommendation scenarios. Existing cross-silo FL methods transmit model information to collaboratively build a global model by leveraging the data of overlapped users. However, in reality, the number of overlapped users is often very small, thus largely limiting the performance of such approaches. Moreover, transmitting model information during training requires high communication costs and may cause serious privacy leakage. In this paper, we propose a novel privacy-preserving double distillation framework named FedPDD for cross-silo federated recommendation, which efficiently transfers knowledge when overlapped users are limited. Specifically, our double distillation strategy enables local models to learn not only explicit knowledge from the other party but also implicit knowledge from its past predictions. Moreover, to ensure privacy and high efficiency, we employ an offline training scheme to reduce communication needs and privacy leakage risk. In addition, we adopt differential privacy to further protect the transmitted information. The experiments on two real-world recommendation datasets, HetRec-MovieLens and Criteo, demonstrate the effectiveness of FedPDD compared to the state-of-the-art approaches.

CRJul 3, 2020
Privacy Threats Against Federated Matrix Factorization

Dashan Gao, Ben Tan, Ce Ju et al.

Matrix Factorization has been very successful in practical recommendation applications and e-commerce. Due to data shortage and stringent regulations, it can be hard to collect sufficient data to build performant recommender systems for a single company. Federated learning provides the possibility to bridge the data silos and build machine learning models without compromising privacy and security. Participants sharing common users or items collaboratively build a model over data from all the participants. There have been some works exploring the application of federated learning to recommender systems and the privacy issues in collaborative filtering systems. However, the privacy threats in federated matrix factorization are not studied. In this paper, we categorize federated matrix factorization into three types based on the partition of feature space and analyze privacy threats against each type of federated matrix factorization model. We also discuss privacy-preserving approaches. As far as we are aware, this is the first study of privacy threats of the matrix factorization method in the federated learning framework.

LGJun 15, 2020
Privacy-Preserving Technology to Help Millions of People: Federated Prediction Model for Stroke Prevention

Ce Ju, Ruihui Zhao, Jichao Sun et al.

Prevention of stroke with its associated risk factors has been one of the public health priorities worldwide. Emerging artificial intelligence technology is being increasingly adopted to predict stroke. Because of privacy concerns, patient data are stored in distributed electronic health record (EHR) databases, voluminous clinical datasets, which prevent patient data from being aggregated and restrains AI technology to boost the accuracy of stroke prediction with centralized training data. In this work, our scientists and engineers propose a privacy-preserving scheme to predict the risk of stroke and deploy our federated prediction model on cloud servers. Our system of federated prediction model asynchronously supports any number of client connections and arbitrary local gradient iterations in each communication round. It adopts federated averaging during the model training process, without patient data being taken out of the hospitals during the whole process of model training and forecasting. With the privacy-preserving mechanism, our federated prediction model trains over all the healthcare data from hospitals in a certain city without actual data sharing among them. Therefore, it is not only secure but also more accurate than any single prediction model that trains over the data only from one single hospital. Especially for small hospitals with few confirmed stroke cases, our federated model boosts model performance by 10%~20% in several machine learning metrics. To help stroke experts comprehend the advantage of our prediction system more intuitively, we developed a mobile app that collects the key information of patients' statistics and demonstrates performance comparisons between the federated prediction model and the single prediction model during the federated training process.

LGApr 26, 2020
Federated Transfer Learning for EEG Signal Classification

Ce Ju, Dashan Gao, Ravikiran Mane et al.

The success of deep learning (DL) methods in the Brain-Computer Interfaces (BCI) field for classification of electroencephalographic (EEG) recordings has been restricted by the lack of large datasets. Privacy concerns associated with EEG signals limit the possibility of constructing a large EEG-BCI dataset by the conglomeration of multiple small ones for jointly training machine learning models. Hence, in this paper, we propose a novel privacy-preserving DL architecture named federated transfer learning (FTL) for EEG classification that is based on the federated learning framework. Working with the single-trial covariance matrix, the proposed architecture extracts common discriminative information from multi-subject EEG data with the help of domain adaptation techniques. We evaluate the performance of the proposed architecture on the PhysioNet dataset for 2-class motor imagery classification. While avoiding the actual data sharing, our FTL approach achieves 2% higher classification accuracy in a subject-adaptive analysis. Also, in the absence of multi-subject data, our architecture provides 6% better accuracy compared to other state-of-the-art DL architectures.

IVApr 4, 2020
Volumetric Attention for 3D Medical Image Segmentation and Detection

Xudong Wang, Shizhong Han, Yunqiang Chen et al.

A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction, and allows the use of pretrained 2D detection models when training data is limited, as is often the case for medical applications. Its integration in the Mask R-CNN is shown to enable state-of-the-art performance on the Liver Tumor Segmentation (LiTS) Challenge, outperforming the previous challenge winner by 3.9 points and achieving top performance on the LiTS leader board at the time of paper submission. Detection experiments on the DeepLesion dataset also show that the addition of VA to existing object detectors enables a 69.1 sensitivity at 0.5 false positive per image, outperforming the best published results by 6.6 points.

IVFeb 11, 2020
Validating uncertainty in medical image translation

Jacob C. Reinhold, Yufan He, Shizhong Han et al.

Medical images are increasingly used as input to deep neural networks to produce quantitative values that aid researchers and clinicians. However, standard deep neural networks do not provide a reliable measure of uncertainty in those quantitative values. Recent work has shown that using dropout during training and testing can provide estimates of uncertainty. In this work, we investigate using dropout to estimate epistemic and aleatoric uncertainty in a CT-to-MR image translation task. We show that both types of uncertainty are captured, as defined, providing confidence in the output uncertainty estimates.

IVFeb 11, 2020
Finding novelty with uncertainty

Jacob C. Reinhold, Yufan He, Shizhong Han et al.

Medical images are often used to detect and characterize pathology and disease; however, automatically identifying and segmenting pathology in medical images is challenging because the appearance of pathology across diseases varies widely. To address this challenge, we propose a Bayesian deep learning method that learns to translate healthy computed tomography images to magnetic resonance images and simultaneously calculates voxel-wise uncertainty. Since high uncertainty occurs in pathological regions of the image, this uncertainty can be used for unsupervised anomaly segmentation. We show encouraging experimental results on an unsupervised anomaly segmentation task by combining two types of uncertainty into a novel quantity we call scibilic uncertainty.

IVFeb 10, 2020
Validation and Optimization of Multi-Organ Segmentation on Clinical Imaging Archives

Yuchen Xu, Olivia Tang, Yucheng Tang et al.

Segmentation of abdominal computed tomography(CT) provides spatial context, morphological properties, and a framework for tissue-specific radiomics to guide quantitative Radiological assessment. A 2015 MICCAI challenge spurred substantial innovation in multi-organ abdominal CT segmentation with both traditional and deep learning methods. Recent innovations in deep methods have driven performance toward levels for which clinical translation is appealing. However, continued cross-validation on open datasets presents the risk of indirect knowledge contamination and could result in circular reasoning. Moreover, 'real world' segmentations can be challenging due to the wide variability of abdomen physiology within patients. Herein, we perform two data retrievals to capture clinically acquired deidentified abdominal CT cohorts with respect to a recently published variation on 3D U-Net (baseline algorithm). First, we retrieved 2004 deidentified studies on 476 patients with diagnosis codes involving spleen abnormalities (cohort A). Second, we retrieved 4313 deidentified studies on 1754 patients without diagnosis codes involving spleen abnormalities (cohort B). We perform prospective evaluation of the existing algorithm on both cohorts, yielding 13% and 8% failure rate, respectively. Then, we identified 51 subjects in cohort A with segmentation failures and manually corrected the liver and gallbladder labels. We re-trained the model adding the manual labels, resulting in performance improvement of 9% and 6% failure rate for the A and B cohorts, respectively. In summary, the performance of the baseline on the prospective cohorts was similar to that on previously published datasets. Moreover, adding data from the first cohort substantively improved performance when evaluated on the second withheld validation cohort.

CVFeb 10, 2020
Outlier Guided Optimization of Abdominal Segmentation

Yuchen Xu, Olivia Tang, Yucheng Tang et al.

Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical image processing, as the shape and distribution of abdominal organs can vary greatly among the population and within an individual over time. While continuous integration of novel datasets into the training set provides potential for better segmentation performance, collection of data at scale is not only costly, but also impractical in some contexts. Moreover, it remains unclear what marginal value additional data have to offer. Herein, we propose a single-pass active learning method through human quality assurance (QA). We built on a pre-trained 3D U-Net model for abdominal multi-organ segmentation and augmented the dataset either with outlier data (e.g., exemplars for which the baseline algorithm failed) or inliers (e.g., exemplars for which the baseline algorithm worked). The new models were trained using the augmented datasets with 5-fold cross-validation (for outlier data) and withheld outlier samples (for inlier data). Manual labeling of outliers increased Dice scores with outliers by 0.130, compared to an increase of 0.067 with inliers (p<0.001, two-tailed paired t-test). By adding 5 to 37 inliers or outliers to training, we find that the marginal value of adding outliers is higher than that of adding inliers. In summary, improvement on single-organ performance was obtained without diminishing multi-organ performance or significantly increasing training time. Hence, identification and correction of baseline failure cases present an effective and efficient method of selecting training data to improve algorithm performance.

IVDec 1, 2019
Stochastic tissue window normalization of deep learning on computed tomography

Yuankai Huo, Yucheng Tang, Yunqiang Chen et al.

Tissue window filtering has been widely used in deep learning for computed tomography (CT) image analyses to improve training performance (e.g., soft tissue windows for abdominal CT). However, the effectiveness of tissue window normalization is questionable since the generalizability of the trained model might be further harmed, especially when such models are applied to new cohorts with different CT reconstruction kernels, contrast mechanisms, dynamic variations in the acquisition, and physiological changes. We evaluate the effectiveness of both with and without using soft tissue window normalization on multisite CT cohorts. Moreover, we propose a stochastic tissue window normalization (SWN) method to improve the generalizability of tissue window normalization. Different from the random sampling, the SWN method centers the randomization around the soft tissue window to maintain the specificity for abdominal organs. To evaluate the performance of different strategies, 80 training and 453 validation and testing scans from six datasets are employed to perform multi-organ segmentation using standard 2D U-Net. The six datasets cover the scenarios, where the training and testing scans are from (1) same scanner and same population, (2) same CT contrast but different pathology, and (3) different CT contrast and pathology. The traditional soft tissue window and nonwindowed approaches achieved better performance on (1). The proposed SWN achieved general superior performance on (2) and (3) with statistical analyses, which offers better generalizability for a trained model.

IVNov 14, 2019
Contrast Phase Classification with a Generative Adversarial Network

Yucheng Tang, Ho Hin Lee, Yuchen Xu et al.

Dynamic contrast enhanced computed tomography (CT) is an imaging technique that provides critical information on the relationship of vascular structure and dynamics in the context of underlying anatomy. A key challenge for image processing with contrast enhanced CT is that phase discrepancies are latent in different tissues due to contrast protocols, vascular dynamics, and metabolism variance. Previous studies with deep learning frameworks have been proposed for classifying contrast enhancement with networks inspired by computer vision. Here, we revisit the challenge in the context of whole abdomen contrast enhanced CTs. To capture and compensate for the complex contrast changes, we propose a novel discriminator in the form of a multi-domain disentangled representation learning network. The goal of this network is to learn an intermediate representation that separates contrast enhancement from anatomy and enables classification of images with varying contrast time. Briefly, our unpaired contrast disentangling GAN(CD-GAN) Discriminator follows the ResNet architecture to classify a CT scan from different enhancement phases. To evaluate the approach, we trained the enhancement phase classifier on 21060 slices from two clinical cohorts of 230 subjects. Testing was performed on 9100 slices from 30 independent subjects who had been imaged with CT scans from all contrast phases. Performance was quantified in terms of the multi-class normalized confusion matrix. The proposed network significantly improved correspondence over baseline UNet, ResNet50 and StarGAN performance of accuracy scores 0.54. 0.55, 0.62 and 0.91, respectively. The proposed discriminator from the disentangled network presents a promising technique that may allow deeper modeling of dynamic imaging against patient specific anatomies.

IVNov 12, 2019
Semi-Supervised Multi-Organ Segmentation through Quality Assurance Supervision

Ho Hin Lee, Yucheng Tang, Olivia Tang et al.

Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores can be generatedIn this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional true/false, and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method.

SPSep 11, 2019
HHHFL: Hierarchical Heterogeneous Horizontal Federated Learning for Electroencephalography

Dashan Gao, Ce Ju, Xiguang Wei et al.

Electroencephalography (EEG) classification techniques have been widely studied for human behavior and emotion recognition tasks. But it is still a challenging issue since the data may vary from subject to subject, may change over time for the same subject, and maybe heterogeneous. Recent years, increasing privacy-preserving demands poses new challenges to this task. The data heterogeneity, as well as the privacy constraint of the EEG data, is not concerned in previous studies. To fill this gap, in this paper, we propose a heterogeneous federated learning approach to train machine learning models over heterogeneous EEG data, while preserving the data privacy of each party. To verify the effectiveness of our approach, we conduct experiments on a real-world EEG dataset, consisting of heterogeneous data collected from diverse devices. Our approach achieves consistent performance improvement on every task.

CVApr 9, 2019
Towards Universal Object Detection by Domain Attention

Xudong Wang, Zhaowei Cai, Dashan Gao et al.

Despite increasing efforts on universal representations for visual recognition, few have addressed object detection. In this paper, we develop an effective and efficient universal object detection system that is capable of working on various image domains, from human faces and traffic signs to medical CT images. Unlike multi-domain models, this universal model does not require prior knowledge of the domain of interest. This is achieved by the introduction of a new family of adaptation layers, based on the principles of squeeze and excitation, and a new domain-attention mechanism. In the proposed universal detector, all parameters and computations are shared across domains, and a single network processes all domains all the time. Experiments, on a newly established universal object detection benchmark of 11 diverse datasets, show that the proposed detector outperforms a bank of individual detectors, a multi-domain detector, and a baseline universal detector, with a 1.3x parameter increase over a single-domain baseline detector. The code and benchmark will be released at http://www.svcl.ucsd.edu/projects/universal-detection/.