text_analysis 0.24.0-2 text_analysis: ^0.24.0-2 copied to clipboard
Tokenize text, compute readibility scores for a document and evaluate similarity of terms.
0.24.0-2 #
BREAKING CHANGES
Breaking changes #
- Interface
TextTokenizer
removed. UseTextAnalyzer.tokenize
andTextAnalyzer.tokenizeJson
in stead. - Deleted mixin
LatinLanguageAnalyzerMixin
. - Moved class
TermSimilarityBase
fromtext_analysis
library. - Moved all mixins and base-classes to
implementation
mini-library.
New #
- New method
TextAnalyzer.tokenize
. - New method
TextAnalyzer.tokenizeJson
. - New class
LatinLanguageAnalyzer
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.24.0-1 #
BREAKING CHANGES
Breaking changes #
- Interface
TextTokenizer
removed. UseTextAnalyzer.tokenize
andTextAnalyzer.tokenizeJson
in stead. - Deleted mixin
LatinLanguageAnalyzerMixin
. - Moved class
TermSimilarityBase
fromtext_analysis
library. - Moved all mixins and base-classes to
implementation
mini-library.
New #
- New method
TextAnalyzer.tokenize
. - New method
TextAnalyzer.tokenizeJson
. - New class
LatinLanguageAnalyzer
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.7+14 #
0.23.7+12 #
Bug fixes #
- Changed signature of
TextTokenizer.tokenize
andTextTokenizer.tokenizeJson
to make parameternGramRange
nullable.
0.23.7+7 #
Bug fixes #
- Fixed keyword extraction bug.
- Changed signature of extension method
kGrams
on String.
0.23.7+5 #
Bug fixes #
- Removed implementation library and added its exports to text_analysis library.
0.23.7 #
0.23.5 #
BREAKING CHANGES
Breaking changes #
- Added field
TermSimilarity.startsWithSimilarity
. - Changed signature of
TermSimilarity
unnamed factory constructor. - Changed calculation of
getSuggestions
extension method to includestartsWithimilarity
.
New #
- Extension method
double startsWithSimilarity(Term other)
onString
. - Extension method
List<String> startsWith(Iterable<String> terms, [int limit = 10])
onString
. - Extension method
Map<String, double> startsWithSimilarityMap(Iterable<String> terms)
onString
. - Extension method
List<SimilarityIndex> startsWithSimilarities(Iterable<String> terms)
onString
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.4d #
Bug fixes #
- Fixed k-gram generation error
0.23.3 #
BREAKING CHANGES
Breaking changes #
- Changed signature of String extension method
termSimilarities
. - Changed signature of String extension method
termSimilarityMap
. - Changed signature of String extension method
getSuggestions
. - Changed signature of String extension method
matches
. - Changed signature of static method
TermSimilarity.termSimilarities
. - Changed signature of static method
TermSimilarity.termSimilarityMap
. - Changed signature of static method
TermSimilarity.getSuggestions
. - Changed signature of static method
TermSimilarity.matches
. - Changed calculation of
getSuggestions
.
Updated #
- Dependencies.
- Documentation
0.23.2 #
0.23.1 #
Non-breaking changes #
- Added optional parameters to function defintion
Tokenizer
. - Added optional parameters to function defintion
JsonTokenizer
. - Added optional parameters to function defintion
KeywordExtractor
.
Bug fixes #
- Fixed keyword extractor to return all keywords as lower-case.
- Fixed tokenizer to not return duplicate tokens (same term, zone and termPosition).
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.23.0+1 #
0.23.0 #
0.22.0 #
BREAKING CHANGES
Breaking changes #
- Added field
TextAnalyzer.phraseSplitter
. - Added field
TextDocument.keywords
. - Changed signature of
TextDocument
unnamed factory constructor. - Moved export of all mixins and base-classes to
implementation
mini-library. - Changed function definition
TermFilter
.
New #
- New enum
TokenizingStrategy
. - New class
TermCoOccurrenceGraph
. - New mixin class
LatinLanguageAnalyzerMixin
. - New type alias
Phrase
. - New function definition
KeywordExtractor
. - New extension method
Set<String> toUniqueTerms()
onIterable<List<String>>
. - New extension method
Map<String, List<int>> coOccurenceGraph(List<String> terms)
onIterable<List<String>>
. - Added optional named parameter
TokenizingStrategy strategy
toTextTokenizer.tokenize
method. - Added optional named parameter
TokenizingStrategy strategy
toTextTokenizer.tokenizeJson
method. - Implemented method
English.KeywordExtractor
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.21.0 #
BREAKING CHANGES
Breaking changes #
- Static method
TextDocument.analyze
signature changed. Default for parameternGramRange
changed toNGramRange(1, 1)
. - Static method
TextDocument.analyzeJson
signature changed. Default for parameternGramRange
changed toNGramRange(1, 1)
. - Method
TextTokenize.tokenize
signature changed. Default for parameternGramRange
changed toNGramRange(1, 1)
. - Method
TextTokenize.tokenizeJson
signature changed. Default for parameternGramRange
changed toNGramRange(1, 1)
.
Bug fixes #
- Fixed bugs where n-grams would contain repeated words.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.20.0 #
BREAKING CHANGES
Breaking changes #
- Renamed
NGramRange.nMin
toNGramRange.min
. - Renamed
NGramRange.nMax
toNGramRange.max
. - Changed signature of
TextDocument
unnamed factory. - Changed signature of
TextDocument.analyze
factory. - Changed signature of
TextDocument.analyzeJson
factory. - Removed field
TextDocument.analyzer
.
New #
- Implemented
NGramRange.==
andNGramRange.hashCode
. - Added extension method
nGrams(NGramRange range)
onList<String>
. - Added typedef
NGrammer = List<String> Function(String text, NGramRange range)
. - Added field
TextAnalyzer.nGrammer
. - Implemented field
English.nGrammer
. - Added field
TextDocument.syllableCount
. - Added field
TextDocument.nGrams
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.19.0 #
BREAKING CHANGES
Breaking changes #
- Changed signature of
TextTokenizer.tokenize
. - Changed signature of
TextTokenizer.tokenizeJson
. - Changed
TextTokenizer.tokenize
algorithm to generate an n-gram for each token, using an n-gram range. - Changed signature of
Token
default constructor by adding unnamed parameterToken.n
.
New #
- Added class
NGramRange
. - Added field
int Token.n
. - Added optional named parameter
NGramRange nGramRange = NGramRange(1, 2)
toTextDocument.analyze
factory. - Added optional named parameter
NGramRange nGramRange = NGramRange(1, 2)
toTextDocument.analyzeJson
factory.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.18.0 #
BREAKING CHANGES
Breaking changes #
- Removed static method
TermSimilarity.editDistance
. - Removed static method
TermSimilarity.editSimilarity
. - Removed static method
TermSimilarity.editSimilaritiesMap
. - Removed static method
TermSimilarity.lengthDistance
. - Removed static method
TermSimilarity.lengthSimilarity
. - Removed static method
TermSimilarity.lengthSimilaritiesMap
. - Removed static method
TermSimilarity.jaccardSimilarity
. - Removed static method
TermSimilarity.jaccardSimilaritiesMap
. - Removed static method
TermSimilarity.termSimilarities
. - Removed static method
TermSimilarity.termSimilarity
. - Changed signature of String extension method
termSimilarities
. - Changed signature of String extension method
termSimilarityMap
. - Changed signature of String extension method
getSuggestions
. - Changed signature of String extension method
matches
. - Changed signature of static method
TermSimilarity.termSimilarities
. - Changed signature of static method
TermSimilarity.termSimilarityMap
. - Changed signature of static method
TermSimilarity.getSuggestions
. - Changed signature of static method
TermSimilarity.matches
.
New #
- Added mixin class
TermSimilarityMixin
. - Added base class
TermSimilarityBase
. - Added class property
TermSimilarity.term
. - Added class property
TermSimilarity.other
. - Added class property
TermSimilarity.editDistance
. - Added class property
TermSimilarity.editSimilarity
. - Added class property
TermSimilarity.lengthDistance
. - Added class property
TermSimilarity.lengthSimilarity
. - Added class property
TermSimilarity.jaccardSimilarity
. - Added class property
TermSimilarity.characterSimilarity
. - Added class property
TermSimilarity.similarity
. - Added class method
TermSimilarity.toJson()
. - Added class method
TermSimilarity.compareTo(TermSimilarity other)
. - Added extension method
sortBySimilarity(bool descending = true)
onIterable<TermSimilarity>
. - Added extension method
limit(int? limit)
onIterable<TermSimilarity>
. - Added extension method
sortBySimilarity(bool descending = true)
onIterable<SimilarityIndex>
. - Added extension method
limit(int? limit)
onIterable<SimilarityIndex>
. - Added unnamed constructor
TermSimilarity
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.17.1 #
0.17.0 #
BREAKING CHANGES
Breaking changes #
- Changed algorithm for calculating
TermSimilarity.termSimilarity
to apply weighting
New #
- Added class
SimilarityIndex
. - Added String extension method
getSuggestions
. - Added class method
TermSimilarity.getSuggestions
. - Added String extension method
lengthSimilarities
. - Added class method
TermSimilarity.lengthSimilarities
. - Added String extension method
editSimilarities
. - Added class method
TermSimilarity.editSimilarities
. - Added String extension method
jaccardSimilarities
. - Added class method
TermSimilarity.jaccardSimilarities
. - Added String extension method
termSimilarities
. - Added class method
TermSimilarity.termSimilarities
. - Added function definition
KGramsMap
. - Added extension
Set<KGram> toKGramsMap([int k = 2])
onIterable<String>
. - Added enumeration
PartOfSpeech
. - Added enumeration
PoSTag
.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.16.0 #
BREAKING CHANGES
Breaking changes #
- Removed
stopWords
andabbreviations
fields fromTextAnalyzer
. - Changed implementation of
TextTokenizer.tokenize
to match removal ofstopWords
andabbreviations
fromTextAnalyzer
. - Moved all constants from
English
toEnglishConstants
. - Removed parameters
stemmer
,lemmatizer
,stopWords
andabbreviations
fromEnglish
. Extend theEnglish
class to use different values for these fields. - Changed implementation of
English.termSplitter
. - Changed signature of
TextTokenizer
unnamed factory constructor, now requiresanalyzer
parameter.
New #
- New mini-library
constants
. - New extension class on String
EnglishStringExtensions
added toextensions
mini-library. - New static const
TextTokenizer.english
shortcut factory method.
Bug fixes #
- Fixed handling of accented characters in
English.syllableCounter
. - Fixed bugs in
English.syllableCounter
to improve accuracy when dealing with hyphenated terms, abbreviations and apostrophes of contraction.
Updated #
- Dependencies.
- Tests.
- Documentation
- Examples.
0.15.1 #
0.15.0 #
0.14.0 #
BREAKING CHANGES
Breaking changes #
- Removed library
package_exports
. - The
Porter2Stemmer
class from theporter_2_stemmer
package is exported by thetext_indexer
library. - The
Porter2StemmerExtension
String extension is exported by theextensions
library.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0 #
BREAKING CHANGES
Breaking changes #
- Added field
TextAnalyzer.stemmer
toTextAnalyzer
class. - Added field
TextAnalyzer.stopWords
toTextAnalyzer
class. - Added field
TextAnalyzer.lemmatizer
toTextAnalyzer
class. - Added field
TextAnalyzer.termExceptions
toTextAnalyzer
class. - Removed static field
TextTokenizer.defaultTokenFilter
. - Changed
TextTokenizer.tokenize
method to applyanalyzer.stemmer
,analyzer.stopWords
,analyzer.lemmatizer
ananalyzer.termExceptions
to all tokens/terms.
New #
- Implemented field
English.stemmer
inEnglish
class. - Implemented field
English.stopWords
inEnglish
class. - Implemented field
English.lemmatizer
inEnglish
class. - Implemented field
English.termExceptions
inEnglish
class.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0-1 #
BREAKING CHANGES
Breaking changes #
- Added field
TextAnalyzer.stemmer
toTextAnalyzer
class. - Added field
TextAnalyzer.stopWords
toTextAnalyzer
class. - Added field
TextAnalyzer.lemmatizer
toTextAnalyzer
class. - Added field
TextAnalyzer.termExceptions
toTextAnalyzer
class. - Removed static field
TextTokenizer.defaultTokenFilter
. - Changed
TextTokenizer.tokenize
method to applyanalyzer.stemmer
,analyzer.stopWords
,analyzer.lemmatizer
ananalyzer.termExceptions
to all tokens/terms.
New #
- Implemented field
English.stemmer
inEnglish
class. - Implemented field
English.stopWords
inEnglish
class. - Implemented field
English.lemmatizer
inEnglish
class. - Implemented field
English.termExceptions
inEnglish
class.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.1 #
0.12.0 #
BREAKING CHANGES
Breaking changes #
- String extensions
extension TermSimilarityExtensions on String
removed fromtext_analysis
library. Import theextensions
library in stead. - Type definitions removed from
text_analysis
library. Import thetype_definitions
library in stead. - Package export
porter_2_stemmer
removed fromtext_analysis
library, import thepackage_exports
library in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added
int term.editDistance(String other)
extension on String. - Added
double term.editDistanceSimilarity(String other)
extension on String. - Added class
TermSimilarity
that exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-2 #
BREAKING CHANGES
Breaking changes #
- String extensions
extension TermSimilarityExtensions on String
removed fromtext_analysis
library. Import theextensions
library in stead. - Type definitions removed from
text_analysis
library. Import thetype_definitions
library in stead. - Package export
porter_2_stemmer
removed fromtext_analysis
library, import thepackage_exports
library in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added
int term.editDistance(String other)
extension on String. - Added
double term.editDistanceSimilarity(String other)
extension on String. - Added class
TermSimilarity
that exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-1 #
BREAKING CHANGES
Breaking changes #
- String extensions
extension TermSimilarityExtensions on String
removed fromtext_analysis
library. Import theextensions
library in stead. - Type definitions removed from
text_analysis
library. Import thetype_definitions
library in stead. - Package export
porter_2_stemmer
removed fromtext_analysis
library, import thepackage_exports
library in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New #
- Added
int term.editDistance(String other)
extension on String. - Added
double term.editDistanceSimilarity(String other)
extension on String. - Added class
TermSimilarity
that exposes static methods for comparing terms.
Bug fixes #
- Fixed issue with tokenizer not incrementing term positions
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.11.2 #
0.11.1 #
0.11.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking changes #
- Renamed
TextAnalyzer
interface toTextTokenizer
. - Renamed
TextAnalyzerConfiguration
interface toTextAnalyzer
. - Added
SentenceSplitter get sentenceSplitter
toTextAnalyzer
interface. - Added
ParagraphSplitter get paragraphSplitter
toTextAnalyzer
interface. - Added
SyllableCounter get syllableCounter
toTextAnalyzer
interface. - Added
List<String> paragraphs(SourceText source)
toITextTokenizer
interface. - Moved class
TextTokenizer
to a private implementation class_TextTokenizerImpl
and renamedITextTokenizer
interface toTextTokenizer
.
New #
- Added mixin class
TextTokenizerMixin
. - Added object model
TextDocument
. - Added typedef
SyllableCounter
. - Added unnamed factory constructor to
TextTokenizer
that initializes a_TextTokenizerImpl
. - Added
SentenceSplitter get sentenceSplitter
toEnglish
class. - Added
ParagraphSplitter get paragraphSplitter
toEnglish
class. - Added
SyllableCounter get syllableCounter
toEnglish
class. - Added
TextDocument
interface. - Added
TextDocumentMixin
mixin class. - Added
TextDocument
unnamed factory with private implementation class. - Added
TextDocument.analyze
factory constructor. - Added
TextDocument.analyzeJson
factory constructor. - Added extension on String
double lengthDistance(Term other)
. - Added extension on String
double lengthSimilarity(Term other)
. - Added extension on String
Map<Term, double> lengthSimilarityMap(Iterable<Term> terms)
.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
Re-organized code repository
0.10.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking changes #
- Renamed
TextAnalyzer
interface toTextTokenizer
. - Renamed
TextAnalyzerConfiguration
interface toTextAnalyzer
. - Added
SentenceSplitter get sentenceSplitter
toTextAnalyzer
interface. - Added
ParagraphSplitter get paragraphSplitter
toTextAnalyzer
interface. - Added
SyllableCounter get syllableCounter
toTextAnalyzer
interface. - Added
List<String> paragraphs(SourceText source)
toITextTokenizer
interface. - Moved class
TextTokenizer
to a private implementation class_TextTokenizerImpl
and renamedITextTokenizer
interface toTextTokenizer
.
New #
- Added mixin class
TextTokenizerMixin
. - Added object model
TextDocument
. - Added typedef
SyllableCounter
. - Added unnamed factory constructor to
TextTokenizer
that initializes a_TextTokenizerImpl
. - Added
SentenceSplitter get sentenceSplitter
toEnglish
class. - Added
ParagraphSplitter get paragraphSplitter
toEnglish
class. - Added
SyllableCounter get syllableCounter
toEnglish
class. - Added
TextDocument
interface. - Added
TextDocumentMixin
mixin class. - Added
TextDocument
unnamed factory with private implementation class. - Added
TextDocument.analyze
factory constructor. - Added
TextDocument.analyzeJson
factory constructor. - Added extension on String
double lengthDistance(Term other)
. - Added extension on String
double lengthSimilarity(Term other)
. - Added extension on String
Map<Term, double> lengthSimilarityMap(Iterable<Term> terms)
.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
- Re-organized code repository.
0.9.1 #
0.9.0 #
BREAKING CHANGES
Breaking changes #
- Removed class
TextSource
. - Removed class
Sentence
. - Removed class
TermPair
. - Removed
TextAnalyzer.sentenceSplitter
fromTextAnalyzer
interface. - Changed
TextTokenizer.tokenize
return value toList<Token>
. - Changed
TextTokenizer.tokenizeJson
return value toList<Token>
.
0.8.1 #
0.8.0 #
0.7.0 #
0.6.5+1 #
PRE-RELEASE
Minor bug fixes, updated dependencies, tests, examples and documentation.
0.6.5 #
0.6.4 #
0.6.3 #
0.6.2 #
0.6.1 #
PRE-RELEASE
- Added type aliases to improve code readability.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.0 #
0.5.0 #
0.4.1 #
0.4.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Added
Token.field
property to token, breaks default generative constructor. - Added
FieldName? field
optional parameter toTextTokenizer.tokenize
method. - Removed deprecated property
Token.index
, useToken.termPosition
instead. - Removed deprecated property
Token.position
, useToken.termPosition
instead. - Removed deprecated extension method
Iterable<Token>.maxIndex
, useIterable<Token>.
Iterable - Removed extension method
Iterable<Token>.minIndex
, useIterable<Token>.
Iterable
New #
- Added new method
ITextAnalyser,tokenizeJson
. - Added new tests.
- Added new examples.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.3.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
TextAnalyzer.characterFilter
changed to non-nullable. Use(phrase) => phrase
if nocharacterFilter
is required.TextAnalyzer.termFilter
changed to non-nullable. Use(phrase) => [phrase]
if notermFilter
is required.
New #
- Added
porter_2_stemmer
package export so it does not need to be imported separately.
Updated #
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.2.0 #
0.1.0 #
0.0.12 #
0.0.9-beta.1 #
0.0.8 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Removed
relevance
extension method fromTokenCollectionExtension
.
0.0.3 #
PRE-RELEASE, BREAKING CHANGES
Breaking changes #
- Stemmer removed from English configuration.
- Stemmer incorporated into default tokenFilter for
TextTokenizer
.
0.0.1-beta.1 #
PRE-RELEASE
Initial version.