porter_2_stemmer 0.0.2 copy "porter_2_stemmer: ^0.0.2" to clipboard
porter_2_stemmer: ^0.0.2 copied to clipboard

outdated

DART implementation of the Porter stemming algorithm, used for reducing a word to its word stem, base or root form.

porter_2_stemmer #

DART implementation of the Porter Stemming Algorithm, used for reducing a word to its word stem, base or root form.

What is the Porter Stemming Algorithm? #

The Porter Stemming Algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up information retrieval systems.

The English (Porter2) stemming algorithm was developed as part of "Snowball", a small string processing language designed for creating stemming algorithms for use in information retrieval.

The Porter 2 algorithm is Copyright (c) 2001, Dr Martin Porter and Copyright (c) 2002, Richard Boulton and licensed under the BSD 3-Clause License.

Install #

In the pubspec.yaml of your flutter project, add the following dependency:

dependencies:
  porter_2_stemmer: <latest_version>

In your library add the following import:

import 'package:porter_2_stemmer/porter_2_stemmer.dart';

Usage #

A string extension is provided, and is the simplest way to get stemming:

import 'package:porter_2_stemmer/porter_2_stemmer.dart';

/// Iterate through a collection of terms/words and print the stem for each 
/// term.
void main() {
  //

  /// Collection of terms/words for which stems are printed.
  final terms = [
    'sky’s',
    'skis',
    'TSLA',
    'APPLE:NASDAQ',
    'consolatory',
    '"news"',
    "mother's",
    'generally',
    'consignment'
  ];

  /// Iterate through the [terms] and print the stem for each term.
  for (final term in terms) {

    // Get the stem for the [term].
    final stem = term.stemPorter2();

    // Print the [stem].
    print('$term => $stem'); // prints "generically => generic"
  }
}

Alternatively, instantiate a Porter2Stemmer instance, optionally passing your preferred exceptions, and call the stem method.

import 'package:porter_2_stemmer/porter_2_stemmer.dart';

/// Instantiates a [Porter2Stemmer] instance using custom a exception for
/// the term "TSLA".
///
/// Prints the terms and their stems.
void main() {
  //

  // collection of terms/words for which stems are printed.
  final terms = [
    'sky’s',
    'skis',
    'TSLA',
    'APPLE:NASDAQ',
    'apple.com',
    'consolatory',
    '"news"',
    "mother's",
    'generally',
    'consignment'
  ];

  // Preserve the default exceptions.
  final exceptions = Map<String, String>.from(Porter2Stemmer.kExceptions);

  // Add a custom exception for "TSLA".
  exceptions['TSLA'] = 'tesla';

  // Instantiate the [Porter2Stemmer] instance using the custom [exceptions]
  final stemmer = Porter2Stemmer(exceptions: exceptions);

  /// Iterate through the [terms] and print the stem for each term.
  for (final term in terms) {
    // Get the stem for the [term].
    final stem = stemmer.stem(term);

    // Print the [stem].
    print('$term => $stem'); // prints "generically => generic"
  }
}

To implement custom exceptions to the algorithm, provide the exceptions parameter (a hashmap of String:String) that provides the term (key) and its stem (value). The default exceptions are:

Default exceptions used by [Porter2Stemmer].
static const kExceptions = {
  'skis': 'ski',
  'skies': 'sky',
  'dying': 'die',
  'lying': 'lie',
  'tying': 'tie',
  'idly': 'idl',
  'gently': 'gentl',
  'ugly': 'ugli',
  'early': 'earli',
  'only': 'onli',
  'singly': 'singl',
};

Quotation Marks, apostrophes and non-language terms #

This implementation:

  • converts all quotation marks and apostrophies to a standard single quote character U+0027 (also ASCII hex 27); and

  • strips all leading and trailing quotation marks from the term before processing begins.

    Terms that match the following criteria (after stripping quotation marks and possessive apostrophy "s") re returned unchanged as they are considered to be acronyms, identifiers or non-language terms that have a specific meaning:

    • terms that are in all-capitals, e.g. TSLA;
    • terms that contain any non-word characters (anything other than letters, apostrophes and hyphens), e.g. apple.com, alibaba:xnys

Terms may be converted to lowercase before processing if stemming of the all-capitals terms is desired. Split terms that contain non-word characters to stem the term parts separately.

Contributions #

Feel free to contribute to this project:

  • If you find a bug or want a feature, but don't know how to fix/implement it, please fill an issue.
  • If you fixed a bug or implemented a feature, please send a pull request.

This project is a supporting package for a revenue project that has priority call on resources, so please be patient if we don't respond immediately to issues or pull requests.

3
likes
0
pub points
52%
popularity

Publisher

verified publishergmconsult.com.au

DART implementation of the Porter stemming algorithm, used for reducing a word to its word stem, base or root form.

Homepage
Repository (GitHub)
View/report issues

License

unknown (license)

More

Packages that depend on porter_2_stemmer