Jie Zhong

LG
h-index15
7papers
118citations
Novelty56%
AI Score38

7 Papers

RONov 25, 2019Code
Zero-Shot Imitating Collaborative Manipulation Plans from YouTube Cooking Videos

Hejia Zhang, Jie Zhong, Stefanos Nikolaidis

People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair a computer. We wish to enable robots with the very same capability. This is challenging; there is a large variation in manipulation actions and some videos even involve multiple persons, who collaborate by sharing and exchanging objects and tools. Furthermore, the learned representations need to be general enough to be transferable to robotic systems. On the other hand, previous work has shown that the space of human manipulation actions has a linguistic, hierarchical structure that relates actions to manipulated objects and tools. Building upon this theory of language for action, we propose a system for understanding and executing demonstrated action sequences from full-length, real-world cooking videos on the web. The system takes as input a new, previously unseen cooking video annotated with object labels and bounding boxes, and outputs a collaborative manipulation action plan for one or more robotic arms. We demonstrate performance of the system in a standardized dataset of 100 YouTube cooking videos, as well as in six full-length Youtube videos that include collaborative actions between two participants. We compare our system with a baseline system that consists of a state-of-the-art action detection baseline and show our system achieves higher action detection accuracy. We additionally propose an open-source platform for executing the learned plans in a simulation environment as well as with an actual robotic arm.

LGAug 4, 2025
Fitness aligned structural modeling enables scalable virtual screening with AuroBind

Zhongyue Zhang, Jiahua Rao, Jie Zhong et al.

Most human proteins remain undrugged, over 96% of human proteins remain unexploited by approved therapeutics. While structure-based virtual screening promises to expand the druggable proteome, existing methods lack atomic-level precision and fail to predict binding fitness, limiting translational impact. We present AuroBind, a scalable virtual screening framework that fine-tunes a custom atomic-level structural model on million-scale chemogenomic data. AuroBind integrates direct preference optimization, self-distillation from high-confidence complexes, and a teacher-student acceleration strategy to jointly predict ligand-bound structures and binding fitness. The proposed models outperform state-of-the-art models on structural and functional benchmarks while enabling 100,000-fold faster screening across ultra-large compound libraries. In a prospective screen across ten disease-relevant targets, AuroBind achieved experimental hit rates of 7-69%, with top compounds reaching sub-nanomolar to picomolar potency. For the orphan GPCRs GPR151 and GPR160, AuroBind identified both agonists and antagonists with success rates of 16-30%, and functional assays confirmed GPR160 modulation in liver and prostate cancer models. AuroBind offers a generalizable framework for structure-function learning and high-throughput molecular screening, bridging the gap between structure prediction and therapeutic discovery.

CLJun 7, 2021
Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan et al.

It is desirable to include more controllable attributes to enhance the diversity of generated responses in open-domain dialogue systems. However, existing methods can generate responses with only one controllable attribute or lack a flexible way to generate them with multiple controllable attributes. In this paper, we propose a Progressively trained Hierarchical Encoder-Decoder (PHED) to tackle this task. More specifically, PHED deploys Conditional Variational AutoEncoder (CVAE) on Transformer to include one aspect of attributes at one stage. A vital characteristic of the CVAE is to separate the latent variables at each stage into two types: a global variable capturing the common semantic features and a specific variable absorbing the attribute information at that stage. PHED then couples the CVAE latent variables with the Transformer encoder and is trained by minimizing a newly derived ELBO and controlled losses to produce the next stage's input and produce responses as required. Finally, we conduct extensive evaluations to show that PHED significantly outperforms the state-of-the-art neural generation models and produces more diverse responses as expected.

CPFeb 2, 2021
A Stochastic Time Series Model for Predicting Financial Trends using NLP

Pratyush Muthukumar, Jie Zhong

Stock price forecasting is a highly complex and vitally important field of research. Recent advancements in deep neural network technology allow researchers to develop highly accurate models to predict financial trends. We propose a novel deep learning model called ST-GAN, or Stochastic Time-series Generative Adversarial Network, that analyzes both financial news texts and financial numerical data to predict stock trends. We utilize cutting-edge technology like the Generative Adversarial Network (GAN) to learn the correlations among textual and numerical data over time. We develop a new method of training a time-series GAN directly using the learned representations of Naive Bayes' sentiment analysis on financial text data alongside technical indicators from numerical data. Our experimental results show significant improvement over various existing models and prior research on deep neural networks for stock price forecasting.

LGMar 17, 2018
AutoML from Service Provider's Perspective: Multi-device, Multi-tenant Model Selection with GP-EI

Chen Yu, Bojan Karlas, Jie Zhong et al.

AutoML has become a popular service that is provided by most leading cloud service providers today. In this paper, we focus on the AutoML problem from the \emph{service provider's perspective}, motivated by the following practical consideration: When an AutoML service needs to serve {\em multiple users} with {\em multiple devices} at the same time, how can we allocate these devices to users in an efficient way? We focus on GP-EI, one of the most popular algorithms for automatic model selection and hyperparameter tuning, used by systems such as Google Vizer. The technical contribution of this paper is the first multi-device, multi-tenant algorithm for GP-EI that is aware of \emph{multiple} computation devices and multiple users sharing the same set of computation devices. Theoretically, given $N$ users and $M$ devices, we obtain a regret bound of $O((\text{\bf {MIU}}(T,K) + M)\frac{N^2}{M})$, where $\text{\bf {MIU}}(T,K)$ refers to the maximal incremental uncertainty up to time $T$ for the covariance matrix $K$. Empirically, we evaluate our algorithm on two applications of automatic model selection, and show that our algorithm significantly outperforms the strategy of serving users independently. Moreover, when multiple computation devices are available, we achieve near-linear speedup when the number of users is much larger than the number of devices.

DBAug 24, 2017
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads

Tian Li, Jie Zhong, Ji Liu et al.

We present ease.ml, a declarative machine learning service platform we built to support more than ten research groups outside the computer science departments at ETH Zurich for their machine learning needs. With ease.ml, a user defines the high-level schema of a machine learning application and submits the task via a Web interface. The system automatically deals with the rest, such as model selection and data movement. In this paper, we describe the ease.ml architecture and focus on a novel technical problem introduced by ease.ml regarding resource allocation. We ask, as a "service provider" that manages a shared cluster of machines among all our users running machine learning workloads, what is the resource allocation strategy that maximizes the global satisfaction of all our users? Resource allocation is a critical yet subtle issue in this multi-tenant scenario, as we have to balance between efficiency and fairness. We first formalize the problem that we call multi-tenant model selection, aiming for minimizing the total regret of all users running automatic model selection tasks. We then develop a novel algorithm that combines multi-armed bandits with Bayesian optimization and prove a regret bound under the multi-tenant setting. Finally, we report our evaluation of ease.ml on synthetic data and on one service we are providing to our users, namely, image classification with deep neural networks. Our experimental evaluation results show that our proposed solution can be up to 9.8x faster in achieving the same global quality for all users as the two popular heuristics used by our users before ease.ml.

MLApr 15, 2017
Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

Jie Zhong, Yijun Huang, Ji Liu

This paper considers the multi-armed thresholding bandit problem -- identifying all arms whose expected rewards are above a predefined threshold via as few pulls (or rounds) as possible -- proposed by Locatelli et al. [2016] recently. Although the proposed algorithm in Locatelli et al. [2016] achieves the optimal round complexity in a certain sense, there still remain unsolved issues. This paper proposes an asynchronous parallel thresholding algorithm and its parameter-free version to improve the efficiency and the applicability. On one hand, the proposed two algorithms use the empirical variance to guide the pull decision at each round, and significantly improve the round complexity of the "optimal" algorithm when all arms have bounded high order moments. The proposed algorithms can be proven to be optimal. On the other hand, most bandit algorithms assume that the reward can be observed immediately after the pull or the next decision would not be made before all rewards are observed. Our proposed asynchronous parallel algorithms allow making the choice of the next pull with unobserved rewards from earlier pulls, which avoids such an unrealistic assumption and significantly improves the identification process. Our theoretical analysis justifies the effectiveness and the efficiency of proposed asynchronous parallel algorithms.