Organize content based on relevant categories/topics
-
Datasheet
Nstein Text Mining Engine: Best-of-breed enterprise semantic analysis software.
Ncategorizer automatically categorizes content, creating relevant relationships between documents
Overview
Nstein's Ncategorizer automatically indexes and sorts documents by category and identifies relevancies. Building on the concepts extracted by Nconcept extractor, it creates document profiles by analyzing those concepts and querying them against an extensible knowledge base.
Applications
Taxonomic Search: Language is inherently ambiguous at times. Take the term "virus". A search query on that term will return both the computer and pathogen meanings of the term. Ncategorizer enables taxonomic search, which disambiguates terms by creating parent and sub-categories, leading to an enhanced user experience.
Content Hyperlinking: Links are an important way to bolster SEO. The more links one has on a specific page, the greater that page's crawlabiilty. Offering relevant and useful links created by entity and category extraction can significantly improve page ranking.
How it works
Once Nconcept extractor reads and linguistically analyses a document, the identified concepts are fed into Ncategorizer, which matches patterns mapped against knowledge bases. Knowledge bases, which are conceptual data sets containing the semantic definition of available categories/topics, are used to match document profiles to corresponding categories. Customer-specific rules are then activated to further refine the end results. Statistics are applied for relevancy and other categorizing critera.
Knowledge bases can be created by using pre-indexed training sets that learn by example, by seeding training sets by sampling an array of documents, or by using pre-trained out-of-the-box knowledge bases, featuring standard taxonomies like IPTC and ICB.
Nstein's TME Manager allows customers to edit their taxonomies/thesauri. It can also import or export specific lists of entities and terms.
Input/Output
Ncategorizer ingests documents in any format, and outputs:
- A list of extracted categories/topics with relevancy rankings
- A global score that indicates the accuracy of the categorization
