text_analysis library

DART text analyzer that extracts tokens from JSON documents for use in information retrieval systems.

Classes

English
A TextAnalyzer implementation for English language analysis.
LatinLanguageAnalyzer
A TextAnalyzer implementation for Latin languages analysis.
NGramRange
Enumerates a range of N-gram sizes (minimum and maximum length).
Porter2Stemmer
DART implementation of the Porter Stemming Algorithm (see https://snowballstem.org/algorithms/), used for reducing a word to its word stem, base or root form.
SimilarityIndex
Object model for a suggestion as alternate for a term. Used in spelling correction and term expansion.
TermCoOccurrenceGraph
A RAKE co-occurrence graph for evaluating the score of keywords extracted from text.
TermCoOccurrenceGraphBase
Base class that implements TermCoOccurrenceGraph and mixes in TermCoOccurrenceGraphMixin.
TermSimilarity
A static/abstract class that exposes methods for computing similarity of terms.
TextAnalyzer
An interface exposes language-specific properties and methods used in text analysis.
TextDocument
The TextDocument object model enumerates properties for analysing a text document:
Token
A Token represents a term (word) present in a text source:

Enums

PartOfSpeech
In grammar, a part-of-speech is a category of words that have similar grammatical properties.
PoSTag
Part of speech tags are used in natural language processing as part of Part-of-Speech tagging.