text_parser #

A Dart package for parsing text flexibly according to preset or custom regular expression patterns.

Usage #

Using preset matchers (URL / email address / phone number) #

If matchers is omitted in TextParser, the three preset matchers (UrlMatcher, EmailMatcher and TelMatcher) are used automatically.

The default regular expression pattern of each of them is not very strict. If it is unsuitable for your use case, overwrite the pattern by yourself, referring to the description in a later section of this document.

import 'package:text_parser/text_parser.dart';

Future<void> main() async {
  const text = 'abc https://example.com/sample.jpg. def\n'
      'foo@example.com +1-012-3456-7890';

  final parser = TextParser();
  final elements = await parser.parse(text);
  elements.forEach(print);
}

Output:

matcherType: TextMatcher, text: abc , groups: []
matcherType: UrlMatcher, text: https://example.com/sample.jpg, groups: []
matcherType: TextMatcher, text: . def\n, groups: []
matcherType: EmailMatcher, text: foo@example.com, groups: []
matcherType: TextMatcher, text:  , groups: []
matcherType: TelMatcher, text: +1-012-3456-7890, groups: []

Obtaining only matching text elements #

By default, the result of parse() contains both matching and non-matching elements as seen in the above example. If you want only matching elements, set onlyMatches to true when calling parse().

final elements = await parser.parse(text, onlyMatches: true);
elements.forEach(print);

Output:

matcherType: UrlMatcher, text: https://example.com/sample.jpg, groups: []
matcherType: EmailMatcher, text: foo@example.com, groups: []
matcherType: TelMatcher, text: +1-012-3456-7890, groups: []

Overwriting the pattern of a preset matcher #

If you want to parse only URLs and phone numbers, but treat only a sequence of eleven numbers after "tel:" as a phone number:

final parser = TextParser(
  matchers: const [
    UrlMatcher(),
    TelMatcher(r'(?<=tel:)\d{11}'),
  ],
);

If the match patterns of multiple matchers have matched the same string at the same position in text, the first matcher is used for parsing the element.

Using a custom matcher #

You can create a custom matcher by extending TextMatcher.

The following is an example of a custom matcher that parses the HTML <a> tags into groups of the href value and link text.

class ATagMatcher extends TextMatcher {
  const ATagMatcher()
      : super(
          r'\<a\s+(?:.+)?href="(.+?)"\s?(?:.+)?\>'
          r'(?:\s+)?(.+?)(?:\s+)?\'
          r'</a\>',
        );
}

const text = '''
<a class="bar" href="https://example.com/">
  Content inside tags
</a>
''';

final parser = TextParser(
  matchers: const [ATagMatcher()],
  dotAll: true,
);
final elements = await parser.parse(text, onlyMatches: true);
print(elements[0].groups);

Output:

[https://example.com/, Content inside tags]

Groups #

Each TextElement in a parse result has the property of groups. It is an array of strings that have matched the smaller pattern inside every set of parentheses ( ).

To give the above code as an example, there are two sets of parentheses; (.+?) in \[(.+?)\] and \((.+?)\). They match "foo" and "bar" respectively, so they are added to the array in that order.

Tip:

If you want certain parentheses to be not captured as a group, add ?: after the starting parenthesis, like (?:pattern) instead of (pattern).

RegExp options #

How a regular expression is treated can be configured in the TextParser constructor.

multiLine
caseSensitive
unicode
dotAll

These options are passed to RegExp internally, so refer to its document for information.

Limitations #

It may take seconds to parse a very long string with multiple complex match patterns.
Parsing is not executed in an isolate but in the main thread on the web, which dart:isolate does not support.

Troubleshooting #

Positive lookbehind sometimes does not work. #

e.g.

Text to be parsed
- '123abc456'
Match pattern 1
- r'\d+'
  - Any sequence of numeric values
Match pattern 2
- r'(?<=\d)[a-z]+'
  - Alphabets after a number

In the above example, you may expect the first match to be "123" and the next match to be "abc", but the second match is actually "456".

This is due to the mechanism of this package that excludes already searched parts of text in later search iterations; "123" is found in the first iteration, and then the next iteration is targeted at "abc456", which does not match (?<=\d).

An easy solution is to add ^ to the positive lookbehind condition, like (?<=\d|^).

Note: Safari has no support for lookbehind assertion.

text_parser 0.2.0
text_parser: ^0.2.0 copied to clipboard

Metadata

text_parser #

Usage #

Using preset matchers (URL / email address / phone number) #

Obtaining only matching text elements #

Overwriting the pattern of a preset matcher #

Using a custom matcher #

Groups #

RegExp options #

Limitations #

Troubleshooting #

Positive lookbehind sometimes does not work. #

← Metadata

Publisher

Metadata

License

Dependencies

More

text_parser 0.2.0 text_parser: ^0.2.0 copied to clipboard

Metadata

text_parser #

Usage #

Using preset matchers (URL / email address / phone number) #

Obtaining only matching text elements #

Overwriting the pattern of a preset matcher #

Using a custom matcher #

Groups #

RegExp options #

Limitations #

Troubleshooting #

Positive lookbehind sometimes does not work. #

← Metadata

Publisher

Metadata

License

Dependencies

More

text_parser 0.2.0
text_parser: ^0.2.0 copied to clipboard