Jinglin Zhao

CV
h-index27
4papers
4citations
Novelty40%
AI Score36

4 Papers

CVDec 7, 2025Code
VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors

Wenbo Lyu, Yingjun Du, Jinglin Zhao et al.

Understanding multi-image, multi-turn scenarios is a critical yet underexplored capability for Large Vision-Language Models (LVLMs). Existing benchmarks predominantly focus on static or horizontal comparisons -- e.g., spotting visual differences or assessing appropriateness -- while relying heavily on language cues. Such settings overlook progressive, context-dependent reasoning and the challenge of visual-to-visual inference. To bridge this gap, we present VisChainBench, a large-scale benchmark designed to rigorously evaluate LVLMs' ability to perform multi-step visual reasoning across sequential, interdependent tasks with minimal language guidance. VisChainBench contains 1,457 tasks spanning over 20,000 images across three diverse domains (e.g., daily scenarios, engineering troubleshooting), structured to mimic real-world decision-making processes. Uniquely, the benchmark is constructed using a multi-agent generation pipeline, ensuring high visual diversity and controlled language bias. All the benchmark data and code for benchmark construction are available for viewing and download via following Link: https://huggingface.co/datasets/eyehole/VisChainBench

CVSep 27, 2019Code
A Topological Nomenclature for 3D Shape Analysis in Connectomics

Abhimanyu Talwar, Zudi Lin, Donglai Wei et al.

One of the essential tasks in connectomics is the morphology analysis of neurons and organelles like mitochondria to shed light on their biological properties. However, these biological objects often have tangled parts or complex branching patterns, which make it hard to abstract, categorize, and manipulate their morphology. In this paper, we develop a novel topological nomenclature system to name these objects like the appellation for chemical compounds to promote neuroscience analysis based on their skeletal structures. We first convert the volumetric representation into the topology-preserving reduced graph to untangle the objects. Next, we develop nomenclature rules for pyramidal neurons and mitochondria from the reduced graph and finally learn the feature embedding for shape manipulation. In ablation studies, we quantitatively show that graphs generated by our proposed method align with the perception of experts. On 3D shape retrieval and decomposition tasks, we qualitatively demonstrate that the encoded topological nomenclature features achieve better results than state-of-the-art shape descriptors. To advance neuroscience, we will release a 3D segmentation dataset of mitochondria and pyramidal neurons reconstructed from a 100um cube electron microscopy volume with their reduced graph and topological nomenclature annotations. Code is publicly available at https://github.com/donglaiw/ibexHelper.

IRAug 21, 2024
DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

Yaowen Bi, Yuteng Lian, Jie Cui et al.

Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis across various tasks in MTL, we have observed an interesting divergence phenomenon that the same feature can have significantly different importance across different tasks in MTL. To address these issues, we propose Deep Multiple Task-specific Feature Interactions Network (DTN) with a novel model structure design. DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup. We applied DTN to our company's real-world E-commerce recommendation dataset, which consisted of over 6.3 billion samples, the results demonstrated that DTN significantly outperformed state-of-the-art MTL models. Moreover, during online evaluation of DTN in a large-scale E-commerce recommender system, we observed a 3.28% in clicks, a 3.10% increase in orders and a 2.70% increase in GMV (Gross Merchandise Value) compared to the state-of-the-art MTL models. Finally, extensive offline experiments conducted on public benchmark datasets demonstrate that DTN can be applied to various scenarios beyond recommendations, enhancing the performance of ranking models.

CVFeb 3, 2024
Mitigating Prior Shape Bias in Point Clouds via Differentiable Center Learning

Zhe Li, Xiying Wang, Jinglin Zhao et al.

Masked autoencoding and generative pretraining have achieved remarkable success in computer vision and natural language processing, and more recently, they have been extended to the point cloud domain. Nevertheless, existing point cloud models suffer from the issue of information leakage due to the pre-sampling of center points, which leads to trivial proxy tasks for the models. These approaches primarily focus on local feature reconstruction, limiting their ability to capture global patterns within point clouds. In this paper, we argue that the reduced difficulty of pretext tasks hampers the model's capacity to learn expressive representations. To address these limitations, we introduce a novel solution called the Differentiable Center Sampling Network (DCS-Net). It tackles the information leakage problem by incorporating both global feature reconstruction and local feature reconstruction as non-trivial proxy tasks, enabling simultaneous learning of both the global and local patterns within point cloud. Experimental results demonstrate that our method enhances the expressive capacity of existing point cloud models and effectively addresses the issue of information leakage.