TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories
This addresses the scalability issue in e-commerce knowledge extraction for applications requiring broad category coverage, though it is incremental as it builds on existing methods with taxonomy integration.
The paper tackled the problem of extracting structured knowledge from product profiles across thousands of diverse categories in e-commerce, proposing TXtract, a taxonomy-aware model that outperformed state-of-the-art approaches by up to 10% in F1 and 15% in coverage.
Extracting structured knowledge from product profiles is crucial for various applications in e-Commerce. State-of-the-art approaches for knowledge extraction were each designed for a single category of product, and thus do not apply to real-life e-Commerce scenarios, which often contain thousands of diverse categories. This paper proposes TXtract, a taxonomy-aware knowledge extraction model that applies to thousands of product categories organized in a hierarchical taxonomy. Through category conditional self-attention and multi-task learning, our approach is both scalable, as it trains a single model for thousands of categories, and effective, as it extracts category-specific attribute values. Experiments on products from a taxonomy with 4,000 categories show that TXtract outperforms state-of-the-art approaches by up to 10% in F1 and 15% in coverage across all categories.