Jennifer Zhang

AI
h-index76
3papers
2citations
Novelty42%
AI Score42

3 Papers

AIJan 27
Insight Agents: An LLM-Based Multi-Agent System for Data Insights

Jincheng Bai, Zhenyu Zhang, Jennifer Zhang et al.

Today, E-commerce sellers face several key challenges, including difficulties in discovering and effectively utilizing available programs and tools, and struggling to understand and utilize rich data from various tools. We therefore aim to develop Insight Agents (IA), a conversational multi-agent Data Insight system, to provide E-commerce sellers with personalized data and business insights through automated information retrieval. Our hypothesis is that IA will serve as a force multiplier for sellers, thereby driving incremental seller adoption by reducing the effort required and increase speed at which sellers make good business decisions. In this paper, we introduce this novel LLM-backed end-to-end agentic system built on a plan-and-execute paradigm and designed for comprehensive coverage, high accuracy, and low latency. It features a hierarchical multi-agent structure, consisting of manager agent and two worker agents: data presentation and insight generation, for efficient information retrieval and problem-solving. We design a simple yet effective ML solution for manager agent that combines Out-of-Domain (OOD) detection using a lightweight encoder-decoder model and agent routing through a BERT-based classifier, optimizing both accuracy and latency. Within the two worker agents, a strategic planning is designed for API-based data model that breaks down queries into granular components to generate more accurate responses, and domain knowledge is dynamically injected to to enhance the insight generator. IA has been launched for Amazon sellers in US, which has achieved high accuracy of 90% based on human evaluation, with latency of P90 below 15s.

IVJul 8, 2025
ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease

Zhiyuan Yang, Kai Li, Sophia Ghamoshi Ramandi et al.

Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. We leverage the VMamba architecture and achieving a mean average precision (mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model's prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available at https://zenodo.org/records/15307021

LGJul 7, 2025
Heterogeneous Causal Learning for Optimizing Aggregated Functions in User Growth

Shuyang Du, Jennifer Zhang, Will Y. Zou

User growth is a major strategy for consumer internet companies. To optimize costly marketing campaigns and maximize user engagement, we propose a novel treatment effect optimization methodology to enhance user growth marketing. By leveraging deep learning, our algorithm learns from past experiments to optimize user selection and reward allocation, maximizing campaign impact while minimizing costs. Unlike traditional prediction methods, our model directly models uplifts in key business metrics. Further, our deep learning model can jointly optimize parameters for an aggregated loss function using softmax gating. Our approach surpasses traditional methods by directly targeting desired business metrics and demonstrates superior algorithmic flexibility in handling complex business constraints. Comprehensive evaluations, including comparisons with state-of-the-art techniques such as R-learner and Causal Forest, validate the effectiveness of our model. We experimentally demonstrate that our proposed constrained and direct optimization algorithms significantly outperform state-of-the-art methods by over $20\%$, proving their cost-efficiency and real-world impact. The versatile methods can be applied to various product scenarios, including optimal treatment allocation. Its effectiveness has also been validated through successful worldwide production deployments.