Runzhou Tao

9papers

61citations

Novelty62%

AI Score54

Ranked #29,206 of 201,326 authors (top 15%)#11,941 in CV (top 20%)

9 Papers

CVMar 15, 2023Code

Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency

Runzhou Tao, Wencheng Han, Zhongying Qiu et al.

Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application. A prominent advantage is that it does not need LiDAR point clouds during the inference. However, most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase. This inconsistency between the training and inference makes it hard to utilize the large-scale feedback data and increases the data collection expenses. To bridge this gap, we propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images. To be specific, we explore three types of consistency in this task, i.e. the projection, multi-view and direction consistency, and design a weakly-supervised architecture based on these consistencies. Moreover, we propose a new 2D direction labeling method in this task to guide the model for accurate rotation direction prediction. Experiments show that our weakly-supervised method achieves comparable performance with some fully supervised methods. When used as a pre-training method, our model can significantly outperform the corresponding fully-supervised baseline with only 1/3 3D labels. https://github.com/weakmono3d/weakmono3d

DSAug 5, 2022

Sublinear Time Algorithm for Online Weighted Bipartite Matching

Hang Hu, Zhao Song, Runzhou Tao et al.

Online bipartite matching is a fundamental problem in online algorithms. The goal is to match two sets of vertices to maximize the sum of the edge weights, where for one set of vertices, each vertex and its corresponding edge weights appear in a sequence. Currently, in the practical recommendation system or search engine, the weights are decided by the inner product between the deep representation of a user and the deep representation of an item. The standard online matching needs to pay $nd$ time to linear scan all the $n$ items, computing weight (assuming each representation vector has length $d$), and then deciding the matching based on the weights. However, in reality, the $n$ could be very large, e.g. in online e-commerce platforms. Thus, improving the time of computing weights is a problem of practical significance. In this work, we provide the theoretical foundation for computing the weights approximately. We show that, with our proposed randomized data structures, the weights can be computed in sublinear time while still preserving the competitive ratio of the matching algorithm.

LGNov 3, 2022

A Convergence Theory for Federated Average: Beyond Smoothness

Xiaoxiao Li, Zhao Song, Runzhou Tao et al.

Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly. As a leading algorithm in this setting, Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in parallel on local devices and averages the sequences only once in a while, have been widely used due to their simplicity and low communication cost. However, despite recent research efforts, it lacks theoretical analysis under assumptions beyond smoothness. In this paper, we analyze the convergence of FedAvg. Different from the existing work, we relax the assumption of strong smoothness. More specifically, we assume the semi-smoothness and semi-Lipschitz properties for the loss function, which have an additional first-order term in assumption definitions. In addition, we also assume bound on the gradient, which is weaker than the commonly used bounded gradient assumption in the convergence analysis scheme. As a solution, this paper provides a theoretical convergence study on Federated Learning.

38.7QUANT-PHApr 3

DisQ: A Model of Distributed Quantum Processors (Extended Version)

Le Chang, Saitej Yavvari, Rance Cleaveland et al.

The next generation of distributed quantum processors combines single-location quantum computing and quantum networking techniques to permit large entangled qubit groups to be established through remote processors, and quantum algorithms can be executed distributively. We present DisQ, as the first formal model of distributed quantum processors, and permit the analysis of distributed quantum programs in the new computation environment. The core of DisQ is a distributed quantum programming language that combines the concepts of Chemical Abstract Machine (CHAM) and Markov Decision Processes (MDP) with the objective of providing clearly distinguishing quantum concurrent and distributed behaviors. Based on the DisQ language, we develop a simulation relation, based on classical simulation infrastructure, to check the equivalence of a quantum algorithm and its distributed versions so that users can develop the distributed version of a sequential quantum program via a simulation check.

94.2QUANT-PHMay 15

End-to-End Formalization of Quantum Error Correction

Mattias Ehatamm, Yi Lee, Xiaodi Wu et al.

Quantum error-correcting codes (QECCs) sit between noisy quantum hardware and reliable computation, so the code parameters used in practice must be trustworthy. The single number that summarizes a code's strength is its distance, yet certifying a distance lower bound is NP-hard in general, placing it beyond the reach of pen-and-paper proofs as well as direct proof-assistant scripting. As a result, distance values in the literature come either from non-scaling hand proofs, or from unverified solvers that leave a trust gap exactly where the code is supposed to provide a guarantee. We present Lean-QEC, the first Lean 4 formalization of stabilizer-code theory that delivers end-to-end, machine-checked distance certificates at industrial code sizes. Lean-QEC formalizes the linear algebra of qubit states, the Pauli group, stabilizer codes, the binary symplectic representation, classical coding theory, and the CSS and Bivariate Bicycle families. To break the combinatorial barrier, Lean-QEC translates the distance condition into a Boolean satisfiability formula through a verified reduction. The pipeline scales through a BitVec-flattened encoding that replaces Lean's Matrix representation, and an error-location encoding that reduces the variable count from $n$ to $k\lceil \log_2 n\rceil$. With these, we obtain automatically-generated Lean-checked distance proofs for a large range of industrially viable qLDPC codes within the Bivariate Bicycle and Generalized Bicycle families, including [[90, 8, 10]] and [[70, 6, 9]] BB codes, with the formulation scaling up to 144 qubits when performed outside the Lean kernel. The resulting library is reusable and is designed to plug into broader Lean-based efforts toward end-to-end verification of fault-tolerant quantum computation.

CVDec 13, 2025

From Human Intention to Action Prediction: Intention-Driven End-to-End Autonomous Driving

Huan Zheng, Yucheng Zhou, Tianyi Yan et al.

While end-to-end autonomous driving has achieved remarkable progress in geometric control, current systems remain constrained by a command-following paradigm that relies on simple navigational instructions. Transitioning to genuinely intelligent agents requires the capability to interpret and fulfill high-level, abstract human intentions. However, this advancement is hindered by the lack of dedicated benchmarks and semantic-aware evaluation metrics. In this paper, we formally define the task of Intention-Driven End-to-End Autonomous Driving and present Intention-Drive, a comprehensive benchmark designed to bridge this gap. We construct a large-scale dataset featuring complex natural language intentions paired with high-fidelity sensor data. To overcome the limitations of conventional trajectory-based metrics, we introduce the Imagined Future Alignment (IFA), a novel evaluation protocol leveraging generative world models to assess the semantic fulfillment of human goals beyond mere geometric accuracy. Furthermore, we explore the solution space by proposing two distinct paradigms: an end-to-end vision-language planner and a hierarchical agent-based framework. The experiments reveal a critical dichotomy where existing models exhibit satisfactory driving stability but struggle significantly with intention fulfillment. Notably, the proposed frameworks demonstrate superior alignment with human intentions.

CVNov 25, 2025

HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning

Hongji Yang, Yucheng Zhou, Wencheng Han et al.

Recent advances in diffusion models have demonstrated impressive capability in generating high-quality images for simple prompts. However, when confronted with complex prompts involving multiple objects and hierarchical structures, existing models struggle to accurately follow instructions, leading to issues such as concept omission, confusion, and poor compositionality. To address these limitations, we propose a Hierarchical Compositional Generative framework (HiCoGen) built upon a novel Chain of Synthesis (CoS) paradigm. Instead of monolithic generation, HiCoGen first leverages a Large Language Model (LLM) to decompose complex prompts into minimal semantic units. It then synthesizes these units iteratively, where the image generated in each step provides crucial visual context for the next, ensuring all textual concepts are faithfully constructed into the final scene. To further optimize this process, we introduce a reinforcement learning (RL) framework. Crucially, we identify that the limited exploration of standard diffusion samplers hinders effective RL. We theoretically prove that sample diversity is maximized by concentrating stochasticity in the early generation stages and, based on this insight, propose a novel Decaying Stochasticity Schedule to enhance exploration. Our RL algorithm is then guided by a hierarchical reward mechanism that jointly evaluates the image at the global, subject, and relationship levels. We also construct HiCoPrompt, a new text-to-image benchmark with hierarchical prompts for rigorous evaluation. Experiments show our approach significantly outperforms existing methods in both concept coverage and compositional accuracy.

LGFeb 2, 2021

Symmetric Sparse Boolean Matrix Factorization and Applications

Sitan Chen, Zhao Song, Runzhou Tao et al.

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given $\mathbf{M}\in\mathbb{Z}^{m\times m}$, we want to find $\mathbf{W}\in\{0,1\}^{m\times r}$ such that $\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0$ is minimized among all $\mathbf{W}$ for which each row is $k$-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation. Equivalently, this can be thought of as recovering a uniformly random $k$-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about $\mathbf{W}$ and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix $\mathbf{W}$ has full column rank with high probability as soon as $m = \widetildeΩ(r)$, which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

LGNov 24, 2020

InstaHide's Sample Complexity When Mixing Two Private Images

Baihe Huang, Zhao Song, Runzhou Tao et al.

Training neural networks usually require large numbers of sensitive training data, and how to protect the privacy of training data has thus become a critical topic in deep learning research. InstaHide is a state-of-the-art scheme to protect training data privacy with only minor effects on test accuracy, and its security has become a salient question. In this paper, we systematically study recent attacks on InstaHide and present a unified framework to understand and analyze these attacks. We find that existing attacks either do not have a provable guarantee or can only recover a single private image. On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity. In addition, we also provide a computational hardness result on retrieving all InstaHide images. Our results demonstrate that InstaHide is not information-theoretically secure but computationally secure in the worst case, even when mixing two private images.