string/text_similarity_utils library

Text similarity score (cosine similarity over TF vectors) — roadmap #437.

Functions

cosineSimilarity(Map<String, int> a, Map<String, int> b) double
Cosine similarity between two term-frequency maps (0.0 to 1.0). Audited: 2026-06-12 11:26 EDT
termFrequencies(List<String> tokens) Map<String, int>
Term frequencies for a list of tokens (e.g. words). Audited: 2026-06-12 11:26 EDT
textSimilarity(String a, String b) double
Returns cosine similarity of a and b when treated as bags of words. Audited: 2026-06-12 11:26 EDT
textToTf(String s) Map<String, int>
Tokenizes s by splitting on non-letters and lowercasing; returns TF map. Audited: 2026-06-12 11:26 EDT