Eric Lo

h-index22

5papers

98citations

Novelty47%

AI Score45

Ranked #42,190 of 194,257 authors (top 22%)#14,823 in CV (top 25%)

5 Papers

7.3CVMar 22, 2022Code

Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition

Zhisheng Zhong, Jiequan Cui, Zeming Li et al.

Deep neural networks perform poorly on heavily class-imbalanced datasets. Given the promising performance of contrastive learning, we propose Rebalanced Siamese Contrastive Mining (ResCom) to tackle imbalanced recognition. Based on the mathematical analysis and simulation results, we claim that supervised contrastive learning suffers a dual class-imbalance problem at both the original batch and Siamese batch levels, which is more serious than long-tailed classification learning. In this paper, at the original batch level, we introduce a class-balanced supervised contrastive loss to assign adaptive weights for different classes. At the Siamese batch level, we present a class-balanced queue, which maintains the same number of keys for all classes. Furthermore, we note that the imbalanced contrastive loss gradient with respect to the contrastive logits can be decoupled into the positives and negatives, and easy positives and easy negatives will make the contrastive gradient vanish. We propose supervised hard positive and negative pairs mining to pick up informative pairs for contrastive computation and improve representation learning. Finally, to approximately maximize the mutual information between the two views, we propose Siamese Balanced Softmax and joint it with the contrastive loss for one-stage training. Extensive experiments demonstrate that ResCom outperforms the previous methods by large margins on multiple long-tailed recognition benchmarks. Our code and models are made publicly available at: https://github.com/dvlab-research/ResCom.

6.6DBApr 1

Compass: General Filtered Search across Vector and Structured Data

Chunxiao Ye, Xiao Yan, Eric Lo

The increasing prevalence of hybrid vector and relational data necessitates efficient, general support for queries that combine high-dimensional vector search with complex relational filtering. However, existing filtered search solutions are fundamentally limited by specialized indices, which restrict arbitrary filtering and hinder integration with general-purpose DBMSs. This work introduces \textsc{Compass}, a unified framework that enables general filtered search across vector and structured data without relying on new index designs. Compass leverages established index structures -- such as HNSW and IVF for vector attributes, and B+-trees for relational attributes -- implementing a principled cooperative query execution strategy that coordinates candidate generation and predicate evaluation across modalities. Uniquely, Compass maintains generality by allowing arbitrary conjunctions, disjunctions, and range predicates, while ensuring robustness even with highly-selective or multi-attribute filters. Comprehensive empirical evaluations demonstrate that Compass consistently outperforms NaviX, the only existing performant general framework, across diverse hybrid query workloads. It also matches the query throughput of specialized single-attribute indices in their favorite settings with only a single attribute involved, all while maintaining full generality and DBMS compatibility. Overall, Compass offers a practical and robust solution for achieving truly general filtered search in vector database systems.

27.2CVJan 7, 2025Code

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang, Yaoyang Liu, Bin Xia et al.

We present Magic Mirror, a framework for generating identity-preserved videos with cinematic-level quality and dynamic motion. While recent advances in video diffusion models have shown impressive capabilities in text-to-video generation, maintaining consistent identity while producing natural motion remains challenging. Previous methods either require person-specific fine-tuning or struggle to balance identity preservation with motion diversity. Built upon Video Diffusion Transformers, our method introduces three key components: (1) a dual-branch facial feature extractor that captures both identity and structural features, (2) a lightweight cross-modal adapter with Conditioned Adaptive Normalization for efficient identity integration, and (3) a two-stage training strategy combining synthetic identity pairs with video data. Extensive experiments demonstrate that Magic Mirror effectively balances identity consistency with natural motion, outperforming existing methods across multiple metrics while requiring minimal parameters added. The code and model will be made publicly available at: https://github.com/dvlab-research/MagicMirror/

20.6CVMay 30, 2023Code

Real-World Image Variation by Aligning Diffusion Inversion Chain

Yuechen Zhang, Jinbo Xing, Eric Lo et al.

Recent diffusion model advancements have enabled high-fidelity images to be generated using text prompts. However, a domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images. Our investigation uncovers that this domain gap originates from a latents' distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain. Specifically, we demonstrate that step-wise latent distribution alignment is essential for generating high-quality variations. To attain this, we design a cross-image self-attention injection for feature interaction and a step-wise distribution normalization to align the latent features. Incorporating these alignment processes into a diffusion model allows RIVAL to generate high-quality image variations without further parameter optimization. Our experimental results demonstrate that our proposed approach outperforms existing methods concerning semantic similarity and perceptual quality. This generalized inference pipeline can be easily applied to other diffusion-based generation tasks, such as image-conditioned text-to-image generation and stylization.

3.2IRAug 18, 2018

Decentralized Search on Decentralized Web

Ziliang Lai, Chris Liu, Eric Lo et al.

Decentralized Web, or DWeb, is envisioned as a promising future of the Web. Being decentralized, there are no dedicated web servers in DWeb; Devices that retrieve web contents also serve their cached data to peer devices with straight privacy-preserving mechanisms. The fact that contents in DWeb are distributed, replicated, and decentralized lead to a number of key advantages over the conventional web. These include better resiliency against network partitioning and distributed-denial-of-service attacks (DDoS), and better browsing experiences in terms of shorter latency and higher throughput. Moreover, DWeb provides tamper-proof contents because each content piece is uniquely identified by a cryptographic hash. DWeb also clicks well with future Internet architectures, such as Named Data Networking (NDN).Search engines have been an inseparable element of the Web. Contemporary ("Web 2.0") search engines, however, provide centralized services. They are thus subject to DDoS attacks, insider threat, and ethical issues like search bias and censorship. As the web moves from being centralized to being decentralized, search engines ought to follow. We propose QueenBee, a decentralized search engine for DWeb. QueenBee is so named because worker bees and honeycomb are a common metaphor for distributed architectures, with the queen being the one that holds the colony together. QueenBee aims to revolutionize the search engine business model by offering incentives to both content providers and peers that participate in QueenBee's page indexing and ranking operations.