hybridTfIdfProbability function
Word probability calculation - Hybird Term Frequency - Inverse Document Frequency
.
This function is used to calculate the probability of a word to be the main topic (important score).
The difference is that tf is occurence in all document not only one document. IDF is the same.
Returns probability for each word
Implementation
Map<String, double> hybridTfIdfProbability(TokenizationOutput tokenOut) {
Map<String, double> wordsProbability = {};
int documentCount = tokenOut.documentTotalWord.length;
//calculate the IDF first, the value is used for all document
Map<String, double> wordIDF = {};
tokenOut.wordInDocumentOccurrence.forEach((key, val) {
wordIDF[key] = log(documentCount / tokenOut.wordInDocumentOccurrence[key]!) /
ln10; //log10
});
//for all word
tokenOut.bagOfWords.forEach((key, val) {
var tf = val / tokenOut.totalNumberOfWords;
var idf = wordIDF[key]!;
wordsProbability[key] = tf * idf;
});
return wordsProbability;
}