hybridTfIdfProbability function

Map<String, double> hybridTfIdfProbability(
  1. TokenizationOutput tokenOut
)

Word probability calculation - Hybird Term Frequency - Inverse Document Frequency.

This function is used to calculate the probability of a word to be the main topic (important score).

The difference is that tf is occurence in all document not only one document. IDF is the same.

Returns probability for each word

Implementation

Map<String, double> hybridTfIdfProbability(TokenizationOutput tokenOut) {
  Map<String, double> wordsProbability = {};
  int documentCount = tokenOut.documentTotalWord.length;

  //calculate the IDF first, the value is used for all document
  Map<String, double> wordIDF = {};
  tokenOut.wordInDocumentOccurrence.forEach((key, val) {
    wordIDF[key] = log(documentCount / tokenOut.wordInDocumentOccurrence[key]!) /
        ln10; //log10
  });

  //for all word
  tokenOut.bagOfWords.forEach((key, val) {
    var tf = val / tokenOut.totalNumberOfWords;
    var idf = wordIDF[key]!;
    wordsProbability[key] = tf * idf;
  });

  return wordsProbability;
}