Yi Zhu

SOFT

h-index21

4papers

75citations

Novelty46%

AI Score29

Ranked #141,128 of 194,257 authors (top 73%)#22 in SOFT (top 59%)

4 Papers

14.1CVSep 6, 2024

UNIT: Unifying Image and Text Recognition in One Vision Encoder

Yi Zhu, Yanpeng Zhou, Chunwei Wang et al.

Currently, vision encoder models like Vision Transformers (ViTs) typically excel at image recognition tasks but cannot simultaneously support text recognition like human visual recognition. To address this limitation, we propose UNIT, a novel training framework aimed at UNifying Image and Text recognition within a single model. Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities. The training process comprises two stages: intra-scale pretraining and inter-scale finetuning. During intra-scale pretraining, UNIT learns unified representations from multi-scale inputs, where images and documents are at their commonly used resolution, to enable fundamental recognition capability. In the inter-scale finetuning stage, the model introduces scale-exchanged data, featuring images and documents at resolutions different from the most commonly used ones, to enhance its scale robustness. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment. Experiments across multiple benchmarks confirm that our method significantly outperforms existing methods on document-related tasks (e.g., OCR and DocQA) while maintaining the performances on natural images, demonstrating its ability to substantially enhance text recognition without compromising its core image recognition capabilities.

2.3SOFTApr 12, 2022Code

Harnessing Interpretable Machine Learning for Holistic Inverse Design of Origami

Yi Zhu, Evgueni T. Filipov

This work harnesses interpretable machine learning methods to address the challenging inverse design problem of origami-inspired systems. We show that a decision tree-random forest method is particularly suitable for fitting origami databases, containing both design features and functional performance, to generate human-understandable decision rules for the inverse design of functional origami. First, the tree method is unique because it can handle complex interactions between categorical features and continuous features, allowing it to compare different origami patterns for a design. Second, this interpretable method can tackle multi-objective problems for designing functional origami with multiple and multi-physical performance targets. Finally, the method can extend existing shape-fitting algorithms for origami to consider non-geometrical performance. The proposed framework enables holistic inverse design of origami, considering both shape and function, to build novel reconfigurable structures for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.

3.3MNJun 1, 2020

When Machine Learning Meets Multiscale Modeling in Chemical Reactions

Wuyue Yang, Liangrong Peng, Yi Zhu et al.

Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applications of traditional machine learning algorithms may face with many difficulties. In this study, through two concrete examples with biological background, we illustrate how the key ideas of multiscale modeling can help to reduce the computational cost of machine learning a lot, as well as how machine learning algorithms perform model reduction automatically in a time-scale separated system. Our study highlights the necessity and effectiveness of an integration of machine learning algorithms and multiscale modeling during the study of chemical reactions.

11.3DSMay 11, 2020

Revealing hidden dynamics from time-series data by ODENet

Pipi Hu, Wuyue Yang, Yi Zhu et al.

To derive the hidden dynamics from observed data is one of the fundamental but also challenging problems in many different fields. In this study, we propose a new type of interpretable network called the ordinary differential equation network (ODENet), in which the numerical integration of explicit ordinary differential equations (ODEs) are embedded into the machine learning scheme to build a general framework for revealing the hidden dynamics buried in massive time-series data efficiently and reliably. ODENet takes full advantage of both machine learning algorithms and ODE modeling. On one hand, the embedding of ODEs makes the framework more interpretable benefiting from the mature theories of ODEs. On the other hand, the schemes of machine learning enable data handling, paralleling, and optimization to be easily and efficiently implemented. From classical Lotka-Volterra equations to chaotic Lorenz equations, the ODENet exhibits its remarkable capability in handling time-series data even in the presence of large noise. We further apply the ODENet to real actin aggregation data, which shows an impressive performance as well. These results demonstrate the superiority of ODENet in dealing with noisy data, data with either non-equal spacing or large sampling time steps over other traditional machine learning algorithms.