LGAIJan 13

Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance

arXiv:2601.08418v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses a crucial but underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms, though it appears incremental as it builds on existing methods with specific enhancements.

The paper tackles the problem of hierarchical tax code prediction for e-commerce platforms by introducing Taxon, a framework that integrates a feature-gating mixture-of-experts architecture and semantic consistency guidance from LLMs, achieving state-of-the-art performance with improved accuracy and structural consistency as demonstrated in deployment handling over 500,000 daily queries.

Tax code prediction is a crucial yet underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms. Each product must be accurately mapped to a node within a multi-level taxonomic hierarchy defined by national standards, where errors lead to financial inconsistencies and regulatory risks. This paper presents Taxon, a semantically aligned and expert-guided framework for hierarchical tax code prediction. Taxon integrates (i) a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels, and (ii) a semantic consistency model distilled from large language models acting as domain experts to verify alignment between product titles and official tax definitions. To address noisy supervision in real business records, we design a multi-source training pipeline that combines curated tax databases, invoice validation logs, and merchant registration data to provide both structural and semantic supervision. Extensive experiments on the proprietary TaxCode dataset and public benchmarks demonstrate that Taxon achieves state-of-the-art performance, outperforming strong baselines. Further, an additional full hierarchical paths reconstruction procedure significantly improves structural consistency, yielding the highest overall F1 scores. Taxon has been deployed in production within Alibaba's tax service system, handling an average of over 500,000 tax code queries per day and reaching peak volumes above five million requests during business event with improved accuracy, interpretability, and robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes