punctuationRegex property
Regular expression to remove punctuation while preserving alphabetic characters, numbers, and whitespace.
This regex r'[^\p{L}\s\p{N}]+'
works as follows:
[^...]
: Negation - matches any character that is not in the set.\p{L}
: Unicode property for "Letter". This matches any kind of letter from any language.\s
: Whitespace characters (spaces, tabs, newlines, etc.).\p{N}
: Unicode property for "Number". This matches any kind of numeric character in any script.+
: Matches one or more occurrences of the preceding element.
Therefore, the entire regex matches one or more characters that are NOT letters, NOT whitespace, and NOT numbers, effectively targeting punctuation and symbols for removal, while preserving letters, whitespace, and numbers.
Implementation
static final RegExp punctuationRegex = RegExp(r'[^\p{L}\s\p{N}]+', unicode: true);