ml_algo 9.2.4 copy "ml_algo: ^9.2.4" to clipboard
ml_algo: ^9.2.4 copied to clipboard

outdated

Machine learning algorithms written in native dart (without bindings to any popular ML libraries, just pure Dart implementation)

Build Status Coverage Status pub package Gitter Chat

Machine learning algorithms with dart #

Table of contents

What is the ml_algo for? #

The main purpose of the library - to give developers, interested both in Dart language and data science, native Dart implementation of machine learning algorithms. This library targeted to dart vm, so, to get smoothest experience with the lib, please, do not use it in a browser.

Following algorithms are implemented:

  • Linear regression:

    • Gradient descent algorithm (batch, mini-batch, stochastic) with ridge regularization
    • Lasso regression (feature selection model)
  • Linear classifier:

    • Logistic regression (with "one-vs-all" multiclass classification)
    • Softmax regression

The library's structure #

To provide main purposes of machine learning, the library exposes the following classes:

Examples #

Logistic regression #

Let's classify records from well-known dataset - Pima Indians Diabets Database via Logistic regressor

Import all necessary packages:

import 'dart:async';

import 'package:ml_algo/ml_algo.dart';

Read csv-file pima_indians_diabetes_database.csv with test data. You can use a csv file from the library's datasets directory:

final data = DataFrame.fromCsv('datasets/pima_indians_diabetes_database.csv', 
  labelName: 'class variable (0 or 1)');
final features = await data.features;
final labels = await data.labels;

Data in this file is represented by 768 records and 8 features. 9th column is a label column, it contains either 0 or 1 on each row. This column is our target - we should predict values of class labels for each observation. Therefore, we should point, where to get label values. Let's use labelName parameter for that (labels column name, 'class variable (0 or 1)' in our case).

Processed features and labels are contained in data structures of Matrix type. To get more information about Matrix type, please, visit ml_linal repo

Then, we should create an instance of CrossValidator class for fitting hyperparameters of our model

final validator = CrossValidator.KFold(numberOfFolds: 5);

All are set, so, we can perform our classification.

Let's create a logistic regression classifier instance with full-batch gradient descent optimizer:

final model = LinearClassifier.logisticRegressor(
    initialLearningRate: .8,
    iterationsLimit: 500,
    gradientType: GradientType.batch,
    fitIntercept: true,
    interceptScale: 0.1,
    learningRateType: LearningRateType.constant);

Evaluate our model via accuracy metric:

final accuracy = validator.evaluate(model, featuresMatrix, labels, MetricType.accuracy);

Let's print score:

print('accuracy on classification: ${maxAccuracy.toStringAsFixed(2)}');

We will see something like this:

acuracy on classification: 0.77

All the code above all together:

import 'dart:async';

import 'package:ml_algo/ml_algo.dart';

Future main() async {
  final data = DataFrame.fromCsv('datasets/pima_indians_diabetes_database.csv', 
     labelName: 'class variable (0 or 1)');
  
  final features = await data.features;
  final labels = await data.labels;

  final validator = CrossValidator.kFold(numberOfFolds: 5);
  
  final model = LinearClassifier.logisticRegressor(
    initialLearningRate: .8,
    iterationsLimit: 500,
    gradientType: GradientType.batch,
    fitIntercept: true,
    interceptScale: .1,
    learningRateType: LearningRateType.constant);
  
  final accuracy = validator.evaluate(model, features, labels, MetricType.accuracy);

  print('accuracy on classification: ${accuracy.toStringFixed(2)}');
}

Softmax regression #

Let's classify another famous dataset - Iris dataset. Data in this csv is separated into 3 classes - therefore we need to use different approach to data classification - Softmax regression.

As usual, start with data preparation:

final data = DataFrame.fromCsv('datasets/iris.csv',
    labelName: 'Species',
    columns: [const Tuple2(1, 5)],
    categoryNameToEncoder: {
      'Species': CategoricalDataEncoderType.oneHot,
    },
);

final features = await data.features;
final labels = await data.labels;

The csv database has 6 columns, but we need to get rid of the first column, because it contains just ID of every observation - it is absolutely useless data. So, as you may notice, we provided a columns range to exclude ID-column:

columns: [const Tuple2(1, 5)]

Also, since the label column 'Species' has categorical data, we encoded it to numerical format:

categoryNameToEncoder: {
  'Species': CategoricalDataEncoderType.oneHot,
},

To see how encoding works, visit the api reference.

Next step - create a cross validator instance:

final validator = CrossValidator.kFold(numberOfFolds: 5);

And finally, create an instance of the classifier:

final softmaxRegressor = LinearClassifier.softmaxRegressor(
      initialLearningRate: 0.03,
      iterationsLimit: null,
      minWeightsUpdate: 1e-6,
      randomSeed: 46,
      learningRateType: LearningRateType.constant);

Evaluate quality of prediction:

final accuracy = validator.evaluate(softmaxRegressor, features, labels, MetricType.accuracy);

print('Iris dataset, softmax regression: accuracy is '
  '${accuracy.toStringAsFixed(2)}'); // It yields 0.93

Gather all the code above all together:

import 'dart:async';

import 'package:ml_algo/ml_algo.dart';
import 'package:tuple/tuple.dart';

Future main() async {
  final data = DataFrame.fromCsv('datasets/iris.csv',
    labelName: 'Species',
    columns: [const Tuple2(1, 5)],
    categoryNameToEncoder: {
      'Species': CategoricalDataEncoderType.oneHot,
    },
  );

  final features = await data.features;
  final labels = await data.labels;

  final validator = CrossValidator.kFold(numberOfFolds: 5);

  final softmaxRegressor = LinearClassifier.softmaxRegressor(
      initialLearningRate: 0.03,
      iterationsLimit: null,
      minWeightsUpdate: 1e-6,
      randomSeed: 46,
      learningRateType: LearningRateType.constant);

  final accuracy = validator.evaluate(
      softmaxRegressor, features, labels, MetricType.accuracy);

  print('Iris dataset, softmax regression: accuracy is '
      '${accuracy.toStringAsFixed(2)}');
}

For more examples please see examples folder

Contacts #

If you have questions, feel free to write me on

100
likes
0
pub points
82%
popularity

Publisher

verified publisherml-algo.com

Machine learning algorithms written in native dart (without bindings to any popular ML libraries, just pure Dart implementation)

Repository (GitHub)
View/report issues

License

unknown (LICENSE)

Dependencies

csv, logging, ml_linalg, tuple

More

Packages that depend on ml_algo