Text Classification Components for Detecting Descriptions and Names of CAD models
This addresses the need for accurate text classification in a domain-specific CAD model search engine, but it is incremental as it applies existing methods to a new application area.
The paper tackled the problem of distinguishing product descriptions from other website text and identifying product names in a specialized search engine for 3D CAD models, using paragraph vectors, character-level LSTM, and LSTM taggers, with promising initial results partially ready for production.
We apply text analysis approaches for a specialized search engine for 3D CAD models and associated products. The main goals are to distinguish between actual product descriptions and other text on a website, as well as to decide whether a given text is or contains a product name. For this we use paragraph vectors for text classification, a character-level long short-term memory network (LSTM) for a single word classification and an LSTM tagger based on word embeddings for detecting product names within sentences. Despite the need to collect bigger datasets in our specific problem domain, the first results are promising and partially fit for production use.