text_analysis 1.0.0+2
text_analysis: ^1.0.0+2 copied to clipboard
Tokenize text, compute readibility scores for a document and evaluate similarity of terms.
1.0.0+2 #
- Stable release.
0.24.0 #
Breaking changes #
- Interface
TextTokenizerremoved. UseTextAnalyzer.tokenizeandTextAnalyzer.tokenizeJsonin stead. - Deleted mixin
LatinLanguageAnalyzerMixin. - Moved class
TermSimilarityBasefromtext_analysislibrary. - Moved all mixins and base-classes to
implementationmini-library. - Changed signature of function definition
Tokenizer. - Changed signature of function definition
JsonTokenizer.
Bug fixes #
- Fixed tokenizer phrase splitter bug.
- Fixed tokenizer term position bug.
New #
- Added extension method
Map<String, double> toKeywordScores()onterable<Token>. - New method
TextAnalyzer.tokenize. - New method
TextAnalyzer.tokenizeJson. - New class
LatinLanguageAnalyzer.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.24.0-5 #
0.24.0-3 #
0.24.0-2 #
BREAKING CHANGES
Breaking changes #
- Interface
TextTokenizerremoved. UseTextAnalyzer.tokenizeandTextAnalyzer.tokenizeJsonin stead. - Deleted mixin
LatinLanguageAnalyzerMixin. - Moved class
TermSimilarityBasefromtext_analysislibrary. - Moved all mixins and base-classes to
implementationmini-library.
New #
- New method
TextAnalyzer.tokenize. - New method
TextAnalyzer.tokenizeJson. - New class
LatinLanguageAnalyzer.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.24.0-1 #
BREAKING CHANGES
Breaking changes #
- Interface
TextTokenizerremoved. UseTextAnalyzer.tokenizeandTextAnalyzer.tokenizeJsonin stead. - Deleted mixin
LatinLanguageAnalyzerMixin. - Moved class
TermSimilarityBasefromtext_analysislibrary. - Moved all mixins and base-classes to
implementationmini-library.
New #
- New method
TextAnalyzer.tokenize. - New method
TextAnalyzer.tokenizeJson. - New class
LatinLanguageAnalyzer.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.7+14 #
0.23.7+12 #
Bug fixes #
- Changed signature of
TextTokenizer.tokenizeandTextTokenizer.tokenizeJsonto make parameternGramRangenullable.
0.23.7+7 #
Bug fixes #
- Fixed keyword extraction bug.
- Changed signature of extension method
kGramson String.
0.23.7+5 #
Bug fixes #
- Removed implementation library and added its exports to text_analysis library.
0.23.7 #
0.23.5 #
BREAKING CHANGES
Breaking changes #
- Added field
TermSimilarity.startsWithSimilarity. - Changed signature of
TermSimilarityunnamed factory constructor. - Changed calculation of
getSuggestionsextension method to includestartsWithimilarity.
New #
- Extension method
double startsWithSimilarity(Term other)onString. - Extension method
List<String> startsWith(Iterable<String> terms, [int limit = 10])onString. - Extension method
Map<String, double> startsWithSimilarityMap(Iterable<String> terms)onString. - Extension method
List<SimilarityIndex> startsWithSimilarities(Iterable<String> terms)onString.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.4d #
Bug fixes #
- Fixed k-gram generation error
0.23.3 #
BREAKING CHANGES
Breaking changes #
- Changed signature of String extension method
termSimilarities. - Changed signature of String extension method
termSimilarityMap. - Changed signature of String extension method
getSuggestions. - Changed signature of String extension method
matches. - Changed signature of static method
TermSimilarity.termSimilarities. - Changed signature of static method
TermSimilarity.termSimilarityMap. - Changed signature of static method
TermSimilarity.getSuggestions. - Changed signature of static method
TermSimilarity.matches. - Changed calculation of
getSuggestions.
Updated #
- Dependencies.
- Documentation
0.23.2 #
0.23.1 #
Non-breaking changes #
- Added optional parameters to function defintion
Tokenizer. - Added optional parameters to function defintion
JsonTokenizer. - Added optional parameters to function defintion
KeywordExtractor.
Bug fixes #
- Fixed keyword extractor to return all keywords as lower-case.
- Fixed tokenizer to not return duplicate tokens (same term, zone and termPosition).
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.0+1 #
0.23.0 #
0.22.0 #
BREAKING CHANGES
Breaking changes #
- Added field
TextAnalyzer.phraseSplitter. - Added field
TextDocument.keywords. - Changed signature of
TextDocumentunnamed factory constructor. - Moved export of all mixins and base-classes to
implementationmini-library. - Changed function definition
TermFilter.
New #
- New enum
TokenizingStrategy. - New class
TermCoOccurrenceGraph. - New mixin class
LatinLanguageAnalyzerMixin. - New type alias
Phrase. - New function definition
KeywordExtractor. - New extension method
Set<String> toUniqueTerms()onIterable<List<String>>. - New extension method
Map<String, List<int>> coOccurenceGraph(List<String> terms)onIterable<List<String>>. - Added optional named parameter
TokenizingStrategy strategytoTextTokenizer.tokenizemethod. - Added optional named parameter
TokenizingStrategy strategytoTextTokenizer.tokenizeJsonmethod. - Implemented method
English.KeywordExtractor.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.21.0 #
BREAKING CHANGES
Breaking changes #
- Static method
TextDocument.analyzesignature changed. Default for parameternGramRangechanged toNGramRange(1, 1). - Static method
TextDocument.analyzeJsonsignature changed. Default for parameternGramRangechanged toNGramRange(1, 1). - Method
TextTokenize.tokenizesignature changed. Default for parameternGramRangechanged toNGramRange(1, 1). - Method
TextTokenize.tokenizeJsonsignature changed. Default for parameternGramRangechanged toNGramRange(1, 1).
Bug fixes #
- Fixed bugs where n-grams would contain repeated words.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.20.0 #
BREAKING CHANGES
Breaking changes #
- Renamed
NGramRange.nMintoNGramRange.min. - Renamed
NGramRange.nMaxtoNGramRange.max. - Changed signature of
TextDocumentunnamed factory. - Changed signature of
TextDocument.analyzefactory. - Changed signature of
TextDocument.analyzeJsonfactory. - Removed field
TextDocument.analyzer.
New #
- Implemented
NGramRange.==andNGramRange.hashCode. - Added extension method
nGrams(NGramRange range)onList<String>. - Added typedef
NGrammer = List<String> Function(String text, NGramRange range). - Added field
TextAnalyzer.nGrammer. - Implemented field
English.nGrammer. - Added field
TextDocument.syllableCount. - Added field
TextDocument.nGrams.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.19.0 #
BREAKING CHANGES
Breaking changes #
- Changed signature of
TextTokenizer.tokenize. - Changed signature of
TextTokenizer.tokenizeJson. - Changed
TextTokenizer.tokenizealgorithm to generate an n-gram for each token, using an n-gram range. - Changed signature of
Tokendefault constructor by adding unnamed parameterToken.n.
New #
- Added class
NGramRange. - Added field
int Token.n. - Added optional named parameter
NGramRange nGramRange = NGramRange(1, 2)toTextDocument.analyzefactory. - Added optional named parameter
NGramRange nGramRange = NGramRange(1, 2)toTextDocument.analyzeJsonfactory.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.18.0 #
BREAKING CHANGES
Breaking changes #
- Removed static method
TermSimilarity.editDistance. - Removed static method
TermSimilarity.editSimilarity. - Removed static method
TermSimilarity.editSimilaritiesMap. - Removed static method
TermSimilarity.lengthDistance. - Removed static method
TermSimilarity.lengthSimilarity. - Removed static method
TermSimilarity.lengthSimilaritiesMap. - Removed static method
TermSimilarity.jaccardSimilarity. - Removed static method
TermSimilarity.jaccardSimilaritiesMap. - Removed static method
TermSimilarity.termSimilarities. - Removed static method
TermSimilarity.termSimilarity. - Changed signature of String extension method
termSimilarities. - Changed signature of String extension method
termSimilarityMap. - Changed signature of String extension method
getSuggestions. - Changed signature of String extension method
matches. - Changed signature of static method
TermSimilarity.termSimilarities. - Changed signature of static method
TermSimilarity.termSimilarityMap. - Changed signature of static method
TermSimilarity.getSuggestions. - Changed signature of static method
TermSimilarity.matches.
New #
- Added mixin class
TermSimilarityMixin. - Added base class
TermSimilarityBase. - Added class property
TermSimilarity.term. - Added class property
TermSimilarity.other. - Added class property
TermSimilarity.editDistance. - Added class property
TermSimilarity.editSimilarity. - Added class property
TermSimilarity.lengthDistance. - Added class property
TermSimilarity.lengthSimilarity. - Added class property
TermSimilarity.jaccardSimilarity. - Added class property
TermSimilarity.characterSimilarity. - Added class property
TermSimilarity.similarity. - Added class method
TermSimilarity.toJson(). - Added class method
TermSimilarity.compareTo(TermSimilarity other). - Added extension method
sortBySimilarity(bool descending = true)onIterable<TermSimilarity>. - Added extension method
limit(int? limit)onIterable<TermSimilarity>. - Added extension method
sortBySimilarity(bool descending = true)onIterable<SimilarityIndex>. - Added extension method
limit(int? limit)onIterable<SimilarityIndex>. - Added unnamed constructor
TermSimilarity.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.17.0 #
BREAKING CHANGES
Breaking changes #
- Changed algorithm for calculating
TermSimilarity.termSimilarityto apply weighting
New #
- Added class
SimilarityIndex. - Added String extension method
getSuggestions. - Added class method
TermSimilarity.getSuggestions. - Added String extension method
lengthSimilarities. - Added class method
TermSimilarity.lengthSimilarities. - Added String extension method
editSimilarities. - Added class method
TermSimilarity.editSimilarities. - Added String extension method
jaccardSimilarities. - Added class method
TermSimilarity.jaccardSimilarities. - Added String extension method
termSimilarities. - Added class method
TermSimilarity.termSimilarities. - Added function definition
KGramsMap. - Added extension
Set<KGram> toKGramsMap([int k = 2])onIterable<String>. - Added enumeration
PartOfSpeech. - Added enumeration
PoSTag.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.16.0 #
BREAKING CHANGES
Breaking changes #
- Removed
stopWordsandabbreviationsfields fromTextAnalyzer. - Changed implementation of
TextTokenizer.tokenizeto match removal ofstopWordsandabbreviationsfromTextAnalyzer. - Moved all constants from
EnglishtoEnglishConstants. - Removed parameters
stemmer,lemmatizer,stopWordsandabbreviationsfromEnglish. Extend theEnglishclass to use different values for these fields. - Changed implementation of
English.termSplitter. - Changed signature of
TextTokenizerunnamed factory constructor, now requiresanalyzerparameter.
New #
- New mini-library
constants. - New extension class on String
EnglishStringExtensionsadded toextensionsmini-library. - New static const
TextTokenizer.englishshortcut factory method.
Bug fixes #
- Fixed handling of accented characters in
English.syllableCounter. - Fixed bugs in
English.syllableCounterto improve accuracy when dealing with hyphenated terms, abbreviations and apostrophes of contraction.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.15.1 #
0.15.0 #
0.14.0 #
BREAKING CHANGES
Breaking changes #
- Removed library
package_exports. - The
Porter2Stemmerclass from theporter_2_stemmerpackage is exported by thetext_indexerlibrary. - The
Porter2StemmerExtensionString extension is exported by theextensionslibrary.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0 #
BREAKING CHANGES
Breaking changes #
- Added field
TextAnalyzer.stemmertoTextAnalyzerclass. - Added field
TextAnalyzer.stopWordstoTextAnalyzerclass. - Added field
TextAnalyzer.lemmatizertoTextAnalyzerclass. - Added field
TextAnalyzer.termExceptionstoTextAnalyzerclass. - Removed static field
TextTokenizer.defaultTokenFilter. - Changed
TextTokenizer.tokenizemethod to applyanalyzer.stemmer,analyzer.stopWords,analyzer.lemmatizerananalyzer.termExceptionsto all tokens/terms.
New #
- Implemented field
English.stemmerinEnglishclass. - Implemented field
English.stopWordsinEnglishclass. - Implemented field
English.lemmatizerinEnglishclass. - Implemented field
English.termExceptionsinEnglishclass.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0-1 #
BREAKING CHANGES
Breaking changes #
- Added field
TextAnalyzer.stemmertoTextAnalyzerclass. - Added field
TextAnalyzer.stopWordstoTextAnalyzerclass. - Added field
TextAnalyzer.lemmatizertoTextAnalyzerclass. - Added field
TextAnalyzer.termExceptionstoTextAnalyzerclass. - Removed static field
TextTokenizer.defaultTokenFilter. - Changed
TextTokenizer.tokenizemethod to applyanalyzer.stemmer,analyzer.stopWords,analyzer.lemmatizerananalyzer.termExceptionsto all tokens/terms.
New #
- Implemented field
English.stemmerinEnglishclass. - Implemented field
English.stopWordsinEnglishclass. - Implemented field
English.lemmatizerinEnglishclass. - Implemented field
English.termExceptionsinEnglishclass.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.1 #
0.12.0 #
BREAKING CHANGES
Breaking changes #
- String extensions
extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead. - Type definitions removed from
text_analysislibrary. Import thetype_definitionslibrary in stead. - Package export
porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added
int term.editDistance(String other)extension on String. - Added
double term.editDistanceSimilarity(String other)extension on String. - Added class
TermSimilaritythat exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-2 #
BREAKING CHANGES
Breaking changes #
- String extensions
extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead. - Type definitions removed from
text_analysislibrary. Import thetype_definitionslibrary in stead. - Package export
porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added
int term.editDistance(String other)extension on String. - Added
double term.editDistanceSimilarity(String other)extension on String. - Added class
TermSimilaritythat exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-1 #
BREAKING CHANGES
Breaking changes #
- String extensions
extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead. - Type definitions removed from
text_analysislibrary. Import thetype_definitionslibrary in stead. - Package export
porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added
int term.editDistance(String other)extension on String. - Added
double term.editDistanceSimilarity(String other)extension on String. - Added class
TermSimilaritythat exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.11.2 #
0.11.1 #
0.11.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking changes #
- Renamed
TextAnalyzerinterface toTextTokenizer. - Renamed
TextAnalyzerConfigurationinterface toTextAnalyzer. - Added
SentenceSplitter get sentenceSplittertoTextAnalyzerinterface. - Added
ParagraphSplitter get paragraphSplittertoTextAnalyzerinterface. - Added
SyllableCounter get syllableCountertoTextAnalyzerinterface. - Added
List<String> paragraphs(SourceText source)toITextTokenizerinterface. - Moved class
TextTokenizerto a private implementation class_TextTokenizerImpland renamedITextTokenizerinterface toTextTokenizer.
New #
- Added mixin class
TextTokenizerMixin. - Added object model
TextDocument. - Added typedef
SyllableCounter. - Added unnamed factory constructor to
TextTokenizerthat initializes a_TextTokenizerImpl. - Added
SentenceSplitter get sentenceSplittertoEnglishclass. - Added
ParagraphSplitter get paragraphSplittertoEnglishclass. - Added
SyllableCounter get syllableCountertoEnglishclass. - Added
TextDocumentinterface. - Added
TextDocumentMixinmixin class. - Added
TextDocumentunnamed factory with private implementation class. - Added
TextDocument.analyzefactory constructor. - Added
TextDocument.analyzeJsonfactory constructor. - Added extension on String
double lengthDistance(Term other). - Added extension on String
double lengthSimilarity(Term other). - Added extension on String
Map<Term, double> lengthSimilarityMap(Iterable<Term> terms).
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
Re-organized code repository
0.10.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking changes #
- Renamed
TextAnalyzerinterface toTextTokenizer. - Renamed
TextAnalyzerConfigurationinterface toTextAnalyzer. - Added
SentenceSplitter get sentenceSplittertoTextAnalyzerinterface. - Added
ParagraphSplitter get paragraphSplittertoTextAnalyzerinterface. - Added
SyllableCounter get syllableCountertoTextAnalyzerinterface. - Added
List<String> paragraphs(SourceText source)toITextTokenizerinterface. - Moved class
TextTokenizerto a private implementation class_TextTokenizerImpland renamedITextTokenizerinterface toTextTokenizer.
New #
- Added mixin class
TextTokenizerMixin. - Added object model
TextDocument. - Added typedef
SyllableCounter. - Added unnamed factory constructor to
TextTokenizerthat initializes a_TextTokenizerImpl. - Added
SentenceSplitter get sentenceSplittertoEnglishclass. - Added
ParagraphSplitter get paragraphSplittertoEnglishclass. - Added
SyllableCounter get syllableCountertoEnglishclass. - Added
TextDocumentinterface. - Added
TextDocumentMixinmixin class. - Added
TextDocumentunnamed factory with private implementation class. - Added
TextDocument.analyzefactory constructor. - Added
TextDocument.analyzeJsonfactory constructor. - Added extension on String
double lengthDistance(Term other). - Added extension on String
double lengthSimilarity(Term other). - Added extension on String
Map<Term, double> lengthSimilarityMap(Iterable<Term> terms).
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
- Re-organized code repository.
0.9.1 #
0.9.0 #
BREAKING CHANGES
Breaking changes #
- Removed class
TextSource. - Removed class
Sentence. - Removed class
TermPair. - Removed
TextAnalyzer.sentenceSplitterfromTextAnalyzerinterface. - Changed
TextTokenizer.tokenizereturn value toList<Token>. - Changed
TextTokenizer.tokenizeJsonreturn value toList<Token>.
0.8.1 #
0.8.0 #
0.7.0 #
0.6.5+1 #
PRE-RELEASE
Minor bug fixes, updated dependencies, tests, examples and documentation.
0.6.5 #
0.6.4 #
0.6.3 #
0.6.2 #
0.6.1 #
PRE-RELEASE
- Added type aliases to improve code readability.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.0 #
0.5.0 #
0.4.1 #
0.4.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Added
Token.fieldproperty to token, breaks default generative constructor. - Added
FieldName? fieldoptional parameter toTextTokenizer.tokenizemethod. - Removed deprecated property
Token.index, useToken.termPositioninstead. - Removed deprecated property
Token.position, useToken.termPositioninstead. - Removed deprecated extension method
Iterable<Token>.maxIndex, useIterable<Token>.Iterable - Removed extension method
Iterable<Token>.minIndex, useIterable<Token>.Iterable
New #
- Added new method
ITextAnalyser,tokenizeJson. - Added new tests.
- Added new examples.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.3.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
TextAnalyzer.characterFilterchanged to non-nullable. Use(phrase) => phraseif nocharacterFilteris required.TextAnalyzer.termFilterchanged to non-nullable. Use(phrase) => [phrase]if notermFilteris required.
New #
- Added
porter_2_stemmerpackage export so it does not need to be imported separately.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.2.0 #
0.1.0 #
0.0.12 #
0.0.9-beta.1 #
0.0.8 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Removed
relevanceextension method fromTokenCollectionExtension.
0.0.3 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Stemmer removed from English configuration.
- Stemmer incorporated into default tokenFilter for
TextTokenizer.
0.0.1-beta.1 #
PRE-RELEASE
Initial version.