Siqi Deng

h-index7

6papers

109citations

Novelty53%

AI Score33

Ranked #121,007 of 194,257 authors (top 62%)#40,233 in CV (top 68%)

6 Papers

2.0LGJun 26, 2023

Balanced Filtering via Disclosure-Controlled Proxies

Siqi Deng, Emily Diana, Michael Kearns et al.

We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner that can use a small set of labeled data to train a proxy function that can later be used for this filtering or selection task. We then associate the range of the proxy function with sampling probabilities; given a new example, we classify it using our proxy function and then select it with probability corresponding to its proxy classification. Importantly, we require that the proxy classification does not reveal significantly more information about the sensitive group membership of any individual example compared to population base rates alone (i.e., the level of disclosure should be controlled) and show that we can find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze its generalization properties.

12.8CVMar 18, 2024Code

Align and Distill: Unifying and Improving Domain Adaptive Object Detection

Justin Kay, Timm Haucke, Suzanne Stathatos et al. · mit

Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +0.6 AP50 on CFC Kenai to Channel. ALDI and ALDI++ are architecture-agnostic, setting a new state-of-the-art for YOLO and DETR-based DAOD as well without additional hyperparameter tuning. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.

5.0CVMay 11, 2023Code

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

Zhaoyang Zhang, Yantao Shen, Kunyu Shi et al.

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). With rich and structured information such as task input/output format, TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.

6.5CVJun 8, 2021

Harnessing Unrecognizable Faces for Improving Face Recognition

Siqi Deng, Yuanjun Xiong, Meng Wang et al.

The common implementation of face recognition systems as a cascade of a detection stage and a recognition or verification stage can cause problems beyond failures of the detector. When the detector succeeds, it can detect faces that cannot be recognized, no matter how capable the recognition system. Recognizability, a latent variable, should therefore be factored into the design and implementation of face recognition systems. We propose a measure of recognizability of a face image that leverages a key empirical observation: an embedding of face images, implemented by a deep neural network trained using mostly recognizable identities, induces a partition of the hypersphere whereby unrecognizable identities cluster together. This occurs regardless of the phenomenon that causes a face to be unrecognizable, it be optical or motion blur, partial occlusion, spatial quantization, poor illumination. Therefore, we use the distance from such an "unrecognizable identity" as a measure of recognizability, and incorporate it in the design of the over-all system. We show that accounting for recognizability reduces error rate of single-image face recognition by 58% at FAR=1e-5 on the IJB-C Covariate Verification benchmark, and reduces verification error rate by 24% at FAR=1e-5 in set-based recognition on the IJB-C benchmark.

16.6CVNov 18, 2020

Positive-Congruent Training: Towards Regression-Free Model Updates

Sijie Yan, Yuanjun Xiong, Kaustav Kundu et al.

Reducing inconsistencies in the behavior of different versions of an AI system can be as important in practice as reducing its overall error. In image classification, sample-wise inconsistencies appear as "negative flips": A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model. Positive-congruent (PC) training aims at reducing error rate while at the same time reducing negative flips, thus maximizing congruency with the reference model only on positive predictions, unlike model distillation. We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model by giving more weights to samples that were correctly classified. We also found that, if the reference model itself can be chosen as an ensemble of multiple deep neural networks, negative flips can be further reduced without affecting the new model's accuracy.

5.0CVOct 30, 2020

SMOT: Single-Shot Multi Object Tracking

Wei Li, Yuanjun Xiong, Shuo Yang et al.

We present single-shot multi-object tracker (SMOT), a new tracking framework that converts any single-shot detector (SSD) model into an online multiple object tracker, which emphasizes simultaneously detecting and tracking of the object paths. Contrary to the existing tracking by detection approaches which suffer from errors made by the object detectors, SMOT adopts the recently proposed scheme of tracking by re-detection. We combine this scheme with SSD detectors by proposing a novel tracking anchor assignment module. With this design SMOT is able to generate tracklets with a constant per-frame runtime. A light-weighted linkage algorithm is then used for online tracklet linking. On three benchmarks of object tracking: Hannah, Music Videos, and MOT17, the proposed SMOT achieves state-of-the-art performance.