text_analysis library
DART text analyzer that extracts tokens from JSON documents for use in information retrieval systems.
Classes
- English
- A TextAnalyzer implementation for English language analysis.
- LatinLanguageAnalyzer
- A TextAnalyzer implementation for Latin languages analysis.
- NGramRange
- Enumerates a range of N-gram sizes (minimum and maximum length).
- Porter2Stemmer
- DART implementation of the Porter Stemming Algorithm (see https://snowballstem.org/algorithms/), used for reducing a word to its word stem, base or root form.
- SimilarityIndex
- Object model for a suggestion as alternate for a term. Used in spelling correction and term expansion.
- TermCoOccurrenceGraph
- A RAKE co-occurrence graph for evaluating the score of keywords extracted from text.
- TermCoOccurrenceGraphBase
- Base class that implements TermCoOccurrenceGraph and mixes in TermCoOccurrenceGraphMixin.
- TermSimilarity
- A static/abstract class that exposes methods for computing similarity of terms.
- TextAnalyzer
- An interface exposes language-specific properties and methods used in text analysis.
- TextDocument
- The TextDocument object model enumerates properties for analysing a text document:
- Token
- A Token represents a term (word) present in a text source:
Enums
- PartOfSpeech
-
In grammar, a
part-of-speech
is a category of words that have similar grammatical properties. - PoSTag
-
Part of speech tags are used in natural language processing as part of
Part-of-Speech tagging
.