string/duplicate_doc_utils library
Near-duplicate document detector via fingerprints — roadmap #438.
Functions
-
clusterNearDuplicates(
List< String> documents, {double threshold = 0.85}) → List<List< int> > -
Groups
documentsinto near-duplicate clusters (greedy). Audited: 2026-06-12 11:26 EDT -
isNearDuplicate(
String a, String b, {double threshold = 0.85}) → bool -
Returns true if
aandbare near-duplicates (cosine similarity >=threshold). Audited: 2026-06-12 11:26 EDT