text_analysis 1.0.0+2  text_analysis: ^1.0.0+2 copied to clipboard
text_analysis: ^1.0.0+2 copied to clipboard
Tokenize text, compute readibility scores for a document and evaluate similarity of terms.
1.0.0+2 #
- Stable release.
0.24.0 #
Breaking changes #
- Interface TextTokenizerremoved. UseTextAnalyzer.tokenizeandTextAnalyzer.tokenizeJsonin stead.
- Deleted mixin LatinLanguageAnalyzerMixin.
- Moved class TermSimilarityBasefromtext_analysislibrary.
- Moved all mixins and base-classes to implementationmini-library.
- Changed signature of function definition Tokenizer.
- Changed signature of function definition JsonTokenizer.
Bug fixes #
- Fixed tokenizer phrase splitter bug.
- Fixed tokenizer term position bug.
New #
- Added extension method Map<String, double> toKeywordScores()onterable<Token>.
- New method TextAnalyzer.tokenize.
- New method TextAnalyzer.tokenizeJson.
- New class LatinLanguageAnalyzer.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.24.0-5 #
0.24.0-3 #
0.24.0-2 #
BREAKING CHANGES
Breaking changes #
- Interface TextTokenizerremoved. UseTextAnalyzer.tokenizeandTextAnalyzer.tokenizeJsonin stead.
- Deleted mixin LatinLanguageAnalyzerMixin.
- Moved class TermSimilarityBasefromtext_analysislibrary.
- Moved all mixins and base-classes to implementationmini-library.
New #
- New method TextAnalyzer.tokenize.
- New method TextAnalyzer.tokenizeJson.
- New class LatinLanguageAnalyzer.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.24.0-1 #
BREAKING CHANGES
Breaking changes #
- Interface TextTokenizerremoved. UseTextAnalyzer.tokenizeandTextAnalyzer.tokenizeJsonin stead.
- Deleted mixin LatinLanguageAnalyzerMixin.
- Moved class TermSimilarityBasefromtext_analysislibrary.
- Moved all mixins and base-classes to implementationmini-library.
New #
- New method TextAnalyzer.tokenize.
- New method TextAnalyzer.tokenizeJson.
- New class LatinLanguageAnalyzer.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.7+14 #
0.23.7+12 #
Bug fixes #
- Changed signature of TextTokenizer.tokenizeandTextTokenizer.tokenizeJsonto make parameternGramRangenullable.
0.23.7+7 #
Bug fixes #
- Fixed keyword extraction bug.
- Changed signature of extension method kGramson String.
0.23.7+5 #
Bug fixes #
- Removed implementation library and added its exports to text_analysis library.
0.23.7 #
0.23.5 #
BREAKING CHANGES
Breaking changes #
- Added field TermSimilarity.startsWithSimilarity.
- Changed signature of TermSimilarityunnamed factory constructor.
- Changed calculation of getSuggestionsextension method to includestartsWithimilarity.
New #
- Extension method double startsWithSimilarity(Term other)onString.
- Extension method List<String> startsWith(Iterable<String> terms, [int limit = 10])onString.
- Extension method Map<String, double> startsWithSimilarityMap(Iterable<String> terms)onString.
- Extension method List<SimilarityIndex> startsWithSimilarities(Iterable<String> terms)onString.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.4d #
Bug fixes #
- Fixed k-gram generation error
0.23.3 #
BREAKING CHANGES
Breaking changes #
- Changed signature of String extension method termSimilarities.
- Changed signature of String extension method termSimilarityMap.
- Changed signature of String extension method getSuggestions.
- Changed signature of String extension method matches.
- Changed signature of static method TermSimilarity.termSimilarities.
- Changed signature of static method TermSimilarity.termSimilarityMap.
- Changed signature of static method TermSimilarity.getSuggestions.
- Changed signature of static method TermSimilarity.matches.
- Changed calculation of getSuggestions.
Updated #
- Dependencies.
- Documentation
0.23.2 #
0.23.1 #
Non-breaking changes #
- Added optional parameters to function defintion Tokenizer.
- Added optional parameters to function defintion JsonTokenizer.
- Added optional parameters to function defintion KeywordExtractor.
Bug fixes #
- Fixed keyword extractor to return all keywords as lower-case.
- Fixed tokenizer to not return duplicate tokens (same term, zone and termPosition).
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.0+1 #
0.23.0 #
0.22.0 #
BREAKING CHANGES
Breaking changes #
- Added field TextAnalyzer.phraseSplitter.
- Added field TextDocument.keywords.
- Changed signature of TextDocumentunnamed factory constructor.
- Moved export of all mixins and base-classes to implementationmini-library.
- Changed function definition TermFilter.
New #
- New enum TokenizingStrategy.
- New class TermCoOccurrenceGraph.
- New mixin class LatinLanguageAnalyzerMixin.
- New type alias Phrase.
- New function definition KeywordExtractor.
- New extension method Set<String> toUniqueTerms()onIterable<List<String>>.
- New extension method Map<String, List<int>> coOccurenceGraph(List<String> terms)onIterable<List<String>>.
- Added optional named parameter TokenizingStrategy strategytoTextTokenizer.tokenizemethod.
- Added optional named parameter TokenizingStrategy strategytoTextTokenizer.tokenizeJsonmethod.
- Implemented method English.KeywordExtractor.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.21.0 #
BREAKING CHANGES
Breaking changes #
- Static method TextDocument.analyzesignature changed. Default for parameternGramRangechanged toNGramRange(1, 1).
- Static method TextDocument.analyzeJsonsignature changed. Default for parameternGramRangechanged toNGramRange(1, 1).
- Method TextTokenize.tokenizesignature changed. Default for parameternGramRangechanged toNGramRange(1, 1).
- Method TextTokenize.tokenizeJsonsignature changed. Default for parameternGramRangechanged toNGramRange(1, 1).
Bug fixes #
- Fixed bugs where n-grams would contain repeated words.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.20.0 #
BREAKING CHANGES
Breaking changes #
- Renamed NGramRange.nMintoNGramRange.min.
- Renamed NGramRange.nMaxtoNGramRange.max.
- Changed signature of TextDocumentunnamed factory.
- Changed signature of TextDocument.analyzefactory.
- Changed signature of TextDocument.analyzeJsonfactory.
- Removed field TextDocument.analyzer.
New #
- Implemented NGramRange.==andNGramRange.hashCode.
- Added extension method nGrams(NGramRange range)onList<String>.
- Added typedef NGrammer = List<String> Function(String text, NGramRange range).
- Added field TextAnalyzer.nGrammer.
- Implemented field English.nGrammer.
- Added field TextDocument.syllableCount.
- Added field TextDocument.nGrams.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.19.0 #
BREAKING CHANGES
Breaking changes #
- Changed signature of TextTokenizer.tokenize.
- Changed signature of TextTokenizer.tokenizeJson.
- Changed TextTokenizer.tokenizealgorithm to generate an n-gram for each token, using an n-gram range.
- Changed signature of Tokendefault constructor by adding unnamed parameterToken.n.
New #
- Added class NGramRange.
- Added field int Token.n.
- Added optional named parameter NGramRange nGramRange = NGramRange(1, 2)toTextDocument.analyzefactory.
- Added optional named parameter NGramRange nGramRange = NGramRange(1, 2)toTextDocument.analyzeJsonfactory.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.18.0 #
BREAKING CHANGES
Breaking changes #
- Removed static method TermSimilarity.editDistance.
- Removed static method TermSimilarity.editSimilarity.
- Removed static method TermSimilarity.editSimilaritiesMap.
- Removed static method TermSimilarity.lengthDistance.
- Removed static method TermSimilarity.lengthSimilarity.
- Removed static method TermSimilarity.lengthSimilaritiesMap.
- Removed static method TermSimilarity.jaccardSimilarity.
- Removed static method TermSimilarity.jaccardSimilaritiesMap.
- Removed static method TermSimilarity.termSimilarities.
- Removed static method TermSimilarity.termSimilarity.
- Changed signature of String extension method termSimilarities.
- Changed signature of String extension method termSimilarityMap.
- Changed signature of String extension method getSuggestions.
- Changed signature of String extension method matches.
- Changed signature of static method TermSimilarity.termSimilarities.
- Changed signature of static method TermSimilarity.termSimilarityMap.
- Changed signature of static method TermSimilarity.getSuggestions.
- Changed signature of static method TermSimilarity.matches.
New #
- Added mixin class TermSimilarityMixin.
- Added base class TermSimilarityBase.
- Added class property TermSimilarity.term.
- Added class property TermSimilarity.other.
- Added class property TermSimilarity.editDistance.
- Added class property TermSimilarity.editSimilarity.
- Added class property TermSimilarity.lengthDistance.
- Added class property TermSimilarity.lengthSimilarity.
- Added class property TermSimilarity.jaccardSimilarity.
- Added class property TermSimilarity.characterSimilarity.
- Added class property TermSimilarity.similarity.
- Added class method TermSimilarity.toJson().
- Added class method TermSimilarity.compareTo(TermSimilarity other).
- Added extension method sortBySimilarity(bool descending = true)onIterable<TermSimilarity>.
- Added extension method limit(int? limit)onIterable<TermSimilarity>.
- Added extension method sortBySimilarity(bool descending = true)onIterable<SimilarityIndex>.
- Added extension method limit(int? limit)onIterable<SimilarityIndex>.
- Added unnamed constructor TermSimilarity.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.17.0 #
BREAKING CHANGES
Breaking changes #
- Changed algorithm for calculating TermSimilarity.termSimilarityto apply weighting
New #
- Added class SimilarityIndex.
- Added String extension method getSuggestions.
- Added class method TermSimilarity.getSuggestions.
- Added String extension method lengthSimilarities.
- Added class method TermSimilarity.lengthSimilarities.
- Added String extension method editSimilarities.
- Added class method TermSimilarity.editSimilarities.
- Added String extension method jaccardSimilarities.
- Added class method TermSimilarity.jaccardSimilarities.
- Added String extension method termSimilarities.
- Added class method TermSimilarity.termSimilarities.
- Added function definition KGramsMap.
- Added extension Set<KGram> toKGramsMap([int k = 2])onIterable<String>.
- Added enumeration PartOfSpeech.
- Added enumeration PoSTag.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.16.0 #
BREAKING CHANGES
Breaking changes #
- Removed stopWordsandabbreviationsfields fromTextAnalyzer.
- Changed implementation of TextTokenizer.tokenizeto match removal ofstopWordsandabbreviationsfromTextAnalyzer.
- Moved all constants from EnglishtoEnglishConstants.
- Removed parameters stemmer,lemmatizer,stopWordsandabbreviationsfromEnglish. Extend theEnglishclass to use different values for these fields.
- Changed implementation of English.termSplitter.
- Changed signature of TextTokenizerunnamed factory constructor, now requiresanalyzerparameter.
New #
- New mini-library constants.
- New extension class on String EnglishStringExtensionsadded toextensionsmini-library.
- New static const TextTokenizer.englishshortcut factory method.
Bug fixes #
- Fixed handling of accented characters in English.syllableCounter.
- Fixed bugs in English.syllableCounterto improve accuracy when dealing with hyphenated terms, abbreviations and apostrophes of contraction.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.15.1 #
0.15.0 #
0.14.0 #
BREAKING CHANGES
Breaking changes #
- Removed library package_exports.
- The Porter2Stemmerclass from theporter_2_stemmerpackage is exported by thetext_indexerlibrary.
- The Porter2StemmerExtensionString extension is exported by theextensionslibrary.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0 #
BREAKING CHANGES
Breaking changes #
- Added field TextAnalyzer.stemmertoTextAnalyzerclass.
- Added field TextAnalyzer.stopWordstoTextAnalyzerclass.
- Added field TextAnalyzer.lemmatizertoTextAnalyzerclass.
- Added field TextAnalyzer.termExceptionstoTextAnalyzerclass.
- Removed static field TextTokenizer.defaultTokenFilter.
- Changed TextTokenizer.tokenizemethod to applyanalyzer.stemmer,analyzer.stopWords,analyzer.lemmatizerananalyzer.termExceptionsto all tokens/terms.
New #
- Implemented field English.stemmerinEnglishclass.
- Implemented field English.stopWordsinEnglishclass.
- Implemented field English.lemmatizerinEnglishclass.
- Implemented field English.termExceptionsinEnglishclass.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0-1 #
BREAKING CHANGES
Breaking changes #
- Added field TextAnalyzer.stemmertoTextAnalyzerclass.
- Added field TextAnalyzer.stopWordstoTextAnalyzerclass.
- Added field TextAnalyzer.lemmatizertoTextAnalyzerclass.
- Added field TextAnalyzer.termExceptionstoTextAnalyzerclass.
- Removed static field TextTokenizer.defaultTokenFilter.
- Changed TextTokenizer.tokenizemethod to applyanalyzer.stemmer,analyzer.stopWords,analyzer.lemmatizerananalyzer.termExceptionsto all tokens/terms.
New #
- Implemented field English.stemmerinEnglishclass.
- Implemented field English.stopWordsinEnglishclass.
- Implemented field English.lemmatizerinEnglishclass.
- Implemented field English.termExceptionsinEnglishclass.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.1 #
0.12.0 #
BREAKING CHANGES
Breaking changes #
- String extensions extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead.
- Type definitions removed from text_analysislibrary. Import thetype_definitionslibrary in stead.
- Package export porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead.
- Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added int term.editDistance(String other)extension on String.
- Added double term.editDistanceSimilarity(String other)extension on String.
- Added class TermSimilaritythat exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-2 #
BREAKING CHANGES
Breaking changes #
- String extensions extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead.
- Type definitions removed from text_analysislibrary. Import thetype_definitionslibrary in stead.
- Package export porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead.
- Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added int term.editDistance(String other)extension on String.
- Added double term.editDistanceSimilarity(String other)extension on String.
- Added class TermSimilaritythat exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-1 #
BREAKING CHANGES
Breaking changes #
- String extensions extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead.
- Type definitions removed from text_analysislibrary. Import thetype_definitionslibrary in stead.
- Package export porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead.
- Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added int term.editDistance(String other)extension on String.
- Added double term.editDistanceSimilarity(String other)extension on String.
- Added class TermSimilaritythat exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.11.2 #
0.11.1 #
0.11.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking changes #
- Renamed TextAnalyzerinterface toTextTokenizer.
- Renamed TextAnalyzerConfigurationinterface toTextAnalyzer.
- Added SentenceSplitter get sentenceSplittertoTextAnalyzerinterface.
- Added ParagraphSplitter get paragraphSplittertoTextAnalyzerinterface.
- Added SyllableCounter get syllableCountertoTextAnalyzerinterface.
- Added List<String> paragraphs(SourceText source)toITextTokenizerinterface.
- Moved class TextTokenizerto a private implementation class_TextTokenizerImpland renamedITextTokenizerinterface toTextTokenizer.
New #
- Added mixin class TextTokenizerMixin.
- Added object model TextDocument.
- Added typedef SyllableCounter.
- Added unnamed factory constructor to TextTokenizerthat initializes a_TextTokenizerImpl.
- Added SentenceSplitter get sentenceSplittertoEnglishclass.
- Added ParagraphSplitter get paragraphSplittertoEnglishclass.
- Added SyllableCounter get syllableCountertoEnglishclass.
- Added TextDocumentinterface.
- Added TextDocumentMixinmixin class.
- Added TextDocumentunnamed factory with private implementation class.
- Added TextDocument.analyzefactory constructor.
- Added TextDocument.analyzeJsonfactory constructor.
- Added extension on String double lengthDistance(Term other).
- Added extension on String double lengthSimilarity(Term other).
- Added extension on String Map<Term, double> lengthSimilarityMap(Iterable<Term> terms).
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
 Re-organized code repository
0.10.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking changes #
- Renamed TextAnalyzerinterface toTextTokenizer.
- Renamed TextAnalyzerConfigurationinterface toTextAnalyzer.
- Added SentenceSplitter get sentenceSplittertoTextAnalyzerinterface.
- Added ParagraphSplitter get paragraphSplittertoTextAnalyzerinterface.
- Added SyllableCounter get syllableCountertoTextAnalyzerinterface.
- Added List<String> paragraphs(SourceText source)toITextTokenizerinterface.
- Moved class TextTokenizerto a private implementation class_TextTokenizerImpland renamedITextTokenizerinterface toTextTokenizer.
New #
- Added mixin class TextTokenizerMixin.
- Added object model TextDocument.
- Added typedef SyllableCounter.
- Added unnamed factory constructor to TextTokenizerthat initializes a_TextTokenizerImpl.
- Added SentenceSplitter get sentenceSplittertoEnglishclass.
- Added ParagraphSplitter get paragraphSplittertoEnglishclass.
- Added SyllableCounter get syllableCountertoEnglishclass.
- Added TextDocumentinterface.
- Added TextDocumentMixinmixin class.
- Added TextDocumentunnamed factory with private implementation class.
- Added TextDocument.analyzefactory constructor.
- Added TextDocument.analyzeJsonfactory constructor.
- Added extension on String double lengthDistance(Term other).
- Added extension on String double lengthSimilarity(Term other).
- Added extension on String Map<Term, double> lengthSimilarityMap(Iterable<Term> terms).
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
- Re-organized code repository.
0.9.1 #
0.9.0 #
BREAKING CHANGES
Breaking changes #
- Removed class TextSource.
- Removed class Sentence.
- Removed class TermPair.
- Removed TextAnalyzer.sentenceSplitterfromTextAnalyzerinterface.
- Changed TextTokenizer.tokenizereturn value toList<Token>.
- Changed TextTokenizer.tokenizeJsonreturn value toList<Token>.
0.8.1 #
0.8.0 #
0.7.0 #
0.6.5+1 #
PRE-RELEASE
Minor bug fixes, updated dependencies, tests, examples and documentation.
0.6.5 #
0.6.4 #
0.6.3 #
0.6.2 #
0.6.1 #
PRE-RELEASE
- Added type aliases to improve code readability.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.0 #
0.5.0 #
0.4.1 #
0.4.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Added Token.fieldproperty to token, breaks default generative constructor.
- Added FieldName? fieldoptional parameter toTextTokenizer.tokenizemethod.
- Removed deprecated property Token.index, useToken.termPositioninstead.
- Removed deprecated property Token.position, useToken.termPositioninstead.
- Removed deprecated extension method Iterable<Token>.maxIndex, useIterable<Token>.Iterable
- Removed extension method Iterable<Token>.minIndex, useIterable<Token>.Iterable
New #
- Added new method ITextAnalyser,tokenizeJson.
- Added new tests.
- Added new examples.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.3.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- TextAnalyzer.characterFilterchanged to non-nullable. Use- (phrase) => phraseif no- characterFilteris required.
- TextAnalyzer.termFilterchanged to non-nullable. Use- (phrase) => [phrase]if no- termFilteris required.
New #
- Added porter_2_stemmerpackage export so it does not need to be imported separately.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.2.0 #
0.1.0 #
0.0.12 #
0.0.9-beta.1 #
0.0.8 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Removed relevanceextension method fromTokenCollectionExtension.
0.0.3 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Stemmer removed from English configuration.
- Stemmer incorporated into default tokenFilter for TextTokenizer.
0.0.1-beta.1 #
PRE-RELEASE
Initial version.