DBSep 2, 2022
AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis TasksXinyi He, Mengyu Zhou, Mingjie Zhou et al. · stanford
Tabular data analysis is performed every day across various domains. It requires an accurate understanding of field semantics to correctly operate on table fields and find common patterns in daily analysis. In this paper, we introduce the AnaMeta dataset, a collection of 467k tables with derived supervision labels for four types of commonly used field metadata: measure/dimension dichotomy, common field roles, semantic field type, and default aggregation function. We evaluate a wide range of models for inferring metadata as the benchmark. We also propose a multi-encoder framework, called KDF, which improves the metadata understanding capability of tabular models by incorporating distribution and knowledge information. Furthermore, we propose four interfaces for incorporating field metadata into downstream analysis tasks.
IRMay 18Code
DADF: A Distribution-Aware Debiasing Framework for Watch-Time Regression in Recommender SystemsYiqing Yang, Xinlong Zhao, Zhao Liu et al.
Watch-time prediction is a central regression task in short-video recommender systems, where labels are highly long-tailed and residual errors vary systematically across observed watch-time regions. In practice, a model may appear globally calibrated while still overestimating short views and underestimating long views, because opposite errors cancel out in aggregate. Existing methods mainly improve the first-stage watch-time predictor, but often leave such residual distributional bias insufficiently corrected. We propose DADF, a distribution-aware debiasing framework for watch-time regression. Instead of replacing a deployed predictor, DADF performs second-stage multiplicative residual correction on top of it. DADF combines three complementary designs: a dynamic distribution-aware transformation for stabilizing long-tailed correction targets, a debias-factor-aware module for modeling heterogeneous residual patterns using inference-time observable factors, especially video duration, and a multi-label-aware module that exploits auxiliary prediction signals from engagement heads. We evaluate DADF on public short-video benchmarks and a large-scale industrial ranking system. DADF consistently improves both pointwise accuracy and ranking quality across datasets and backbones. In the industrial setting, it achieves a 1.88 percentage-point WUAUC gain over the production baseline, reduces MAE by 12.57%, and yields a statistically significant 0.347% lift in average time spent per device in online A/B testing. These results demonstrate that DADF effectively mitigates local calibration bias and provides a practical plug-in solution for debiasing long-tailed continuous targets. The source code is available at https://github.com/liuzhao09/DADF.
CLOct 22, 2025Code
SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large SpreadsheetsZiwei Wang, Jiayuan Su, Mengyu Zhou et al.
Understanding and reasoning over complex spreadsheets remain fundamental challenges for large language models (LLMs), which often struggle with accurately capturing the complex structure of tables and ensuring reasoning correctness. In this work, we propose SheetBrain, a neuro-symbolic dual workflow agent framework designed for accurate reasoning over tabular data, supporting both spreadsheet question answering and manipulation tasks. SheetBrain comprises three core modules: an understanding module, which produces a comprehensive overview of the spreadsheet - including sheet summary and query-based problem insight to guide reasoning; an execution module, which integrates a Python sandbox with preloaded table-processing libraries and an Excel helper toolkit for effective multi-turn reasoning; and a validation module, which verifies the correctness of reasoning and answers, triggering re-execution when necessary. We evaluate SheetBrain on multiple public tabular QA and manipulation benchmarks, and introduce SheetBench, a new benchmark targeting large, multi-table, and structurally complex spreadsheets. Experimental results show that SheetBrain significantly improves accuracy on both existing benchmarks and the more challenging scenarios presented in SheetBench. Our code is publicly available at https://github.com/microsoft/SheetBrain.
IROct 21, 2025Code
DiffGRM: Diffusion-based Generative Recommendation ModelZhao Liu, Yichen Zhu, Yiqing Yang et al.
Generative recommendation (GR) is an emerging paradigm that represents each item via a tokenizer as an n-digit semantic ID (SID) and predicts the next item by autoregressively generating its SID conditioned on the user's history. However, two structural properties of SIDs make ARMs ill-suited. First, intra-item consistency: the n digits jointly specify one item, yet the left-to-right causality trains each digit only under its prefix and blocks bidirectional cross-digit evidence, collapsing supervision to a single causal path. Second, inter-digit heterogeneity: digits differ in semantic granularity and predictability, while the uniform next-token objective assigns equal weight to all digits, overtraining easy digits and undertraining hard digits. To address these two issues, we propose DiffGRM, a diffusion-based GR model that replaces the autoregressive decoder with a masked discrete diffusion model (MDM), thereby enabling bidirectional context and any-order parallel generation of SID digits for recommendation. Specifically, we tailor DiffGRM in three aspects: (1) tokenization with Parallel Semantic Encoding (PSE) to decouple digits and balance per-digit information; (2) training with On-policy Coherent Noising (OCN) that prioritizes uncertain digits via coherent masking to concentrate supervision on high-value signals; and (3) inference with Confidence-guided Parallel Denoising (CPD) that fills higher-confidence digits first and generates diverse Top-K candidates. Experiments show consistent gains over strong generative and discriminative recommendation baselines on multiple datasets, improving NDCG@10 by 6.9%-15.5%. Code is available at https://github.com/liuzhao09/DiffGRM.
IVApr 15, 2024
ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head DetectionJiayi Wang, Yi-An Mao, Xiaoyu Ma et al.
Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer.
CVMay 30, 2025
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training ModelsYing Yang, Jie Zhang, Xiao Lv et al.
While adversarial attacks on vision-and-language pretraining (VLP) models have been explored, generating natural adversarial samples crafted through realistic and semantically meaningful perturbations remains an open challenge. Existing methods, primarily designed for classification tasks, struggle when adapted to VLP models due to their restricted optimization spaces, leading to ineffective attacks or unnatural artifacts. To address this, we propose \textbf{LightD}, a novel framework that generates natural adversarial samples for VLP models via semantically guided relighting. Specifically, LightD leverages ChatGPT to propose context-aware initial lighting parameters and integrates a pretrained relighting model (IC-light) to enable diverse lighting adjustments. LightD expands the optimization space while ensuring perturbations align with scene semantics. Additionally, gradient-based optimization is applied to the reference lighting image to further enhance attack effectiveness while maintaining visual naturalness. The effectiveness and superiority of the proposed LightD have been demonstrated across various VLP models in tasks such as image captioning and visual question answering.