text_indexing library

Dart library for creating an inverted index on a collection of text documents.

Classes

AsyncCallbackIndex
The AsyncCallbackIndex is a InvertedIndex implementation class that extends AsyncCallbackIndexBase.
AsyncCallbackIndexBase
Base class implementation of InvertedIndex with AsyncCallbackIndexMixin.
AsyncCallbackIndexMixin
A mixin class that implements InvertedIndex. The mixin exposes five callback function fields that must be overriden:
English
A TextAnalyzer implementation for English language analysis.
InMemoryIndex
The InMemoryIndex is an implementation of the InvertedIndex interface that extends InMemoryIndexBase.
InMemoryIndexBase
Base class implementation of InvertedIndex with InMemoryIndexMixin.
InMemoryIndexMixin
A mixin class that implements InvertedIndex. The mixin exposes in-memory dictionary and postings fields that must be overriden.
InvertedIndex
An interface that exposes methods for working with an inverted, positional zoned index on a collection of documents.
LatinLanguageAnalyzer
A TextAnalyzer implementation for Latin languages analysis.
NGramRange
Enumerates a range of N-gram sizes (minimum and maximum length).
Porter2Stemmer
DART implementation of the Porter Stemming Algorithm (see https://snowballstem.org/algorithms/), used for reducing a word to its word stem, base or root form.
SimilarityIndex
Object model for a suggestion as alternate for a term. Used in spelling correction and term expansion.
TermCoOccurrenceGraph
A RAKE co-occurrence graph for evaluating the score of keywords extracted from text.
TermCoOccurrenceGraphBase
Base class that implements TermCoOccurrenceGraph and mixes in TermCoOccurrenceGraphMixin.
TermSimilarity
A static/abstract class that exposes methods for computing similarity of terms.
TextAnalyzer
An interface exposes language-specific properties and methods used in text analysis.
TextDocument
The TextDocument object model enumerates properties for analysing a text document:
TextIndexer
Interface for classes that construct and maintain a InvertedIndex for a collection of documents (corpus).
TextIndexerMixin
Mixin class implementation of the TextIndexer interface.
Token
A Token represents a term (word) present in a text source:

Enums

PartOfSpeech
In grammar, a part-of-speech is a category of words that have similar grammatical properties.
PoSTag
Part of speech tags are used in natural language processing as part of Part-of-Speech tagging.