InvertedIndex class abstract
An interface that exposes methods for working with an inverted, positional zoned index on a collection of documents.
- analyzer is the TextAnalyzer used to index the corpus terms.
- vocabularyLength is the number of unique terms in the corpus.
- zones is a hashmap of zone names to their relative weight in the index.
If zones is empty, all the
JSON
fields will be indexed. - k is the length of k-gram entries in the k-gram index.
- nGramRange is the range of N-gram lengths to generate. The minimum n-gram length is 1. If n-gram length is greater than 1, the index vocabulary also contains n-grams up to nGramRange.max long, concatenated from consecutive terms. The index size is increased by a factor of nGramRange.max.
- getDictionary Asynchronously retrieves a DftMap for a collection of terms from a DftMap repository.
- upsertDictionary inserts entries into a DftMap repository, overwriting any existing entries.
- getKGramIndex Asynchronously retrieves a KGramsMap for a collection of k-grams from a KGramsMap repository.
- upsertKGramIndex inserts entries into a KGramsMap repository, overwriting any existing entries.
- getPostings asynchronously retrieves PostingsMap for a collection of terms from a PostingsMap repository.
- upsertPostings inserts entries into a PostingsMap repository, overwriting any existing entries. The following static methods are used to work with PostingsMap and DftMap objects.
- tfIndexFromPostings returns a hashmap of term to Ft for the terms
in a PostingsMap, where Ft is the number of times each of the terms
occurs in the
corpus
. - ftdPostingsFromPostings returns a FtdPostings for a collection of terms from a PostingsMap, optionally filtered by minimum term frequency.
Constructors
-
InvertedIndex({required CollectionSizeCallback collectionSizeLoader, required DftMapLoader dictionaryLoader, required DftMapUpdater dictionaryUpdater, required CollectionSizeCallback dictionaryLengthLoader, required KGramsMapLoader kGramIndexLoader, required KGramsMapUpdater kGramIndexUpdater, required PostingsMapLoader postingsLoader, required PostingsMapUpdater postingsUpdater, required KeywordPostingsMapLoader keywordPostingsLoader, required KeywordPostingsMapUpdater keywordPostingsUpdater, required TextAnalyzer analyzer, TokenFilter? tokenFilter, Map<
String, int> ? dictionary, Map<String, Map< ? postings, Map<String, Map< >String, List< >int> >String, Set< ? kGramIndex, int k = 2, NGramRange? nGramRange, Map<String> >String, double> zones = const <String, double>{}}) -
/// A factory constructor that returns an AsyncCallbackIndex instance.
factory
-
InvertedIndex.inMemory({required TextAnalyzer analyzer, required int collectionSize, Map<
String, int> ? dictionary, Map<String, Map< ? postings, KeywordPostingsMap? keywordPostings, Map<String, Map< >String, List< >int> >String, Set< ? kGramIndex, int k = 2, NGramRange? nGramRange, Map<String> >String, double> zones = const <String, double>{}}) -
A factory constructor that returns an InMemoryIndex instance.
factory
Properties
- analyzer → TextAnalyzer
-
The text analyser that extracts tokens from text for the index.
no setter
- hashCode → int
-
The hash code for this object.
no setterinherited
- k → int
-
The length of k-gram entries in the k-gram index.
no setter
- nGramRange → NGramRange?
-
The minimum and maximum length of n-grams in the index.
no setter
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
-
vocabularyLength
→ Future<
Ft> -
Returns the number of terms in the vocabulary (N).
no setter
- zones → ZoneWeightMap
-
Maps zone names to their relative weight in the index.
no setter
Methods
-
getCollectionSize(
) → Future< int> - Asynchronously returns the total number of documents in the indexed collection.
-
getDictionary(
[Iterable< String> ? terms]) → Future<DftMap> -
Asynchronously retrieves a DftMap for the
terms
from a DftMap repository. -
getKeywordPostings(
Iterable< String> keywords) → Future<KeywordPostingsMap> -
Asynchronously retrieves PostingsMapEntry entities for the
terms
from a PostingsMap repository. -
getKGramIndex(
Iterable< String> kGrams) → Future<KGramsMap> -
Asynchronously retrieves a KGramsMap for the
terms
from a KGramsMap repository. -
getPostings(
Iterable< String> terms) → Future<PostingsMap> -
Asynchronously retrieves PostingsMapEntry entities for the
terms
from a PostingsMap repository. -
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
toString(
) → String -
A string representation of this object.
inherited
-
upsertDictionary(
DftMap values) → Future< void> -
Inserts
values
into a DftMap repository, overwriting them if they already exist. -
upsertKeywordPostings(
KeywordPostingsMap values) → Future< void> -
Inserts
values
into a PostingsMap repository, overwriting them if they already exist. -
upsertKGramIndex(
KGramsMap values) → Future< void> -
Inserts
values
into a KGramsMap repository, overwriting any existing entries. -
upsertPostings(
PostingsMap values) → Future< void> -
Inserts
values
into a PostingsMap repository, overwriting them if they already exist.
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited
Static Methods
-
docTermFrequencies(
PostingsMap postings, [ZoneWeightMap? zones]) → Map< String, double> -
Returns a hashmap of term to weighted document term frequency by iterating
over the
postings
to aggregate the document frequency for the term. -
ftdPostingsFromPostings(
PostingsMap postings, [Ft minFtd = 1]) → FtdPostings - Returns a map of terms to hashmaps of document id to Ft.
-
tfIndexFromPostings(
PostingsMap postings) → DftMap -
Returns a DftMap by iterating over the
postings
to aggregate the document frequency for the term.