Machine learning algorithms with dart
Table of contents
What is the ml_algo for?
The main purpose of the library  to give developers, interested both in Dart language and data science, native Dart implementation of machine learning algorithms. This library targeted to dart vm, so, to get smoothest experience with the lib, please, do not use it in a browser.
Following algorithms are implemented:

Linear regression:
 Gradient descent based linear regression
 Coordinate descent based linear regression

Linear classifier:
 Logistic regression
 Softmax regression

Nonparametric regression:
 KNN regression
The library's structure

Model selection
 CrossValidator. Factory, that creates instances of a cross validator. Cross validation allows researchers to fit different hyperparameters of machine learning algorithms, assessing prediction quality on different parts of a dataset.

Classification algorithms

Linear classification

Logistic regression
An algorithm, that performs linear binary classification.

LogisticRegressor.gradient. Logistic regression with gradient ascent optimization of loglikelihood cost function. To use this kind of classifier your data have to be linearly separable.

LogisticRegressor.coordinate. Not implemented yet. Logistic regression with coordinate descent optimization of negated loglikelihood cost function. Coordinate descent allows to do feature selection (aka
L1 regularization
) To use this kind of classifier your data have to be linearly separable.


Softmax regression
An algorithm, that performs linear multiclass classification.

SoftmaxRegressor.gradient. Softmax regression with gradient ascent optimization of loglikelihood cost function. To use this kind of classifier your data have to be linearly separable.

SoftmaxRegressor.coordinate. Not implemented yet. Softmax regression with coordinate descent optimization of negated loglikelihood cost function. As in case of logistic regression, coordinate descent allows to do feature selection (aka
L1 regularization
) To use this kind of classifier your data have to be linearly separable.




Regression algorithms

Linear regression

LinearRegressor.gradient. A wellknown algorithm, that performs linear regression using gradient vector of a cost function.

LinearRegressor.coordinate An algorithm, that uses coordinate descent in order to find optimal value of a cost function. Coordinate descent allows to perform feature selection along with regression process (This technique often calls
Lasso regression
).


Nonlinear regression
 ParameterlessRegressor.knn
An algorithm, that makes prediction for each new observation based on first
k
closest observations from training data. It has quite high computational complexity, but in the same time it may easily catch nonlinear pattern of the data.
 ParameterlessRegressor.knn
An algorithm, that makes prediction for each new observation based on first

Examples
Logistic regression
Let's classify records from wellknown dataset  Pima Indians Diabets Database via Logistic regressor
Import all necessary packages. First, it's needed to ensure, if you have ml_preprocessing
package in your
dependencies:
dependencies:
ml_preprocessing: ^3.2.0
We need this repo to parse raw data in order to use it farther. For more details, please, visit ml_preprocessing repository page.
import 'dart:async';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';
Download dataset from Pima Indians Diabets Database and read it (of course, you should provide a proper path to your downloaded file):
final data = DataFrame.fromCsv('datasets/pima_indians_diabetes_database.csv',
labelName: 'class variable (0 or 1)');
final features = (await data.features)
.mapColumns((column) => column.normalize()); // it's needed to normalize the matrix columnwise to reach
// computational stability and provide uniform scale for all
// the values in the column
final labels = await data.labels;
Data in this file is represented by 768 records and 8 features. 9th column is a label column, it contains either 0 or 1
on each row. This column is our target  we should predict a class label for each observation. Therefore, we
should point, where to get label values. Let's use labelName
parameter for that (labels column name, 'class variable
(0 or 1)' in our case).
Processed features and labels are contained in data structures of Matrix
type. To get more information about
Matrix
type, please, visit ml_linal repo
Then, we should create an instance of CrossValidator
class for fitting hyperparameters
of our model
final validator = CrossValidator.KFold(numberOfFolds: 5);
All are set, so, we can do our classification.
Evaluate our model via accuracy metric:
final accuracy = validator.evaluate((trainFeatures, trainLabels) =>
LogisticRegressor.gradient(
trainFeatures, trainLabels,
initialLearningRate: .8,
iterationsLimit: 500,
batchSize: 768,
fitIntercept: true,
interceptScale: .1,
learningRateType: LearningRateType.constant),
features, labels, MetricType.accuracy);
Let's print score:
print('accuracy on classification: ${accuracy.toStringAsFixed(2)}');
We will see something like this:
acuracy on classification: 0.77
All the code above all together:
import 'dart:async';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';
Future main() async {
final data = DataFrame.fromCsv('datasets/pima_indians_diabetes_database.csv',
labelName: 'class variable (0 or 1)');
final features = (await data.features).mapColumns((column) => column.normalize());
final labels = await data.labels;
final validator = CrossValidator.kFold(numberOfFolds: 5);
final accuracy = validator.evaluate((trainFeatures, trainLabels) =>
LogisticRegressor.gradient(
trainFeatures, trainLabels,
initialLearningRate: .8,
iterationsLimit: 500,
batchSize: 768,
fitIntercept: true,
interceptScale: .1,
learningRateType: LearningRateType.constant),
features, labels, MetricType.accuracy);
print('accuracy on classification: ${accuracy.toStringFixed(2)}');
}
Softmax regression
Let's classify another famous dataset  Iris dataset. Data in this csv is separated into 3 classes  therefore we need to use different approach to data classification  Softmax regression.
As usual, start with data preparation. Before we start, we should update our pubspec's dependencies with xrange` library:
dependencies:
...
xrange: ^0.0.5
...
Download the file and read it:
final data = DataFrame.fromCsv('datasets/iris.csv',
labelName: 'Species',
columns: [ZRange.closed(1, 5)],
categories: {
'Species': CategoricalDataEncoderType.oneHot,
},
);
final features = await data.features;
final labels = await data.labels;
The csv database has 6 columns, but we need to get rid of the first column, because it contains just ID of every observation  it's absolutely useless data. So, as you may notice, we provided a columns range to exclude IDcolumn:
columns: [ZRange.closed(1, 5)]
Also, since the label column 'Species' has categorical data, we encoded it to numerical format:
categories: {
'Species': CategoricalDataEncoderType.oneHot,
},
Next step  create a cross validator instance:
final validator = CrossValidator.kFold(numberOfFolds: 5);
Evaluate quality of prediction:
final accuracy = validator.evaluate((trainFeatures, trainLabels) =>
LinearClassifier.softmaxRegressor(
trainFeatures, trainLabels,
initialLearningRate: 0.03,
iterationsLimit: null,
minWeightsUpdate: 1e6,
randomSeed: 46,
learningRateType: LearningRateType.constant
), features, labels, MetricType.accuracy);
print('Iris dataset, softmax regression: accuracy is '
'${accuracy.toStringAsFixed(2)}'); // It yields 0.93
Gather all the code above all together:
import 'dart:async';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';
import 'package:xrange/zrange.dart';
Future main() async {
final data = DataFrame.fromCsv('datasets/iris.csv',
labelName: 'Species',
columns: [ZRange.closed(1, 5)],
categories: {
'Species': CategoricalDataEncoderType.oneHot,
},
);
final features = await data.features;
final labels = await data.labels;
final validator = CrossValidator.kFold(numberOfFolds: 5);
final accuracy = validator.evaluate((trainFeatures, trainLabels) =>
LinearClassifier.softmaxRegressor(
trainFeatures, trainLabels,
initialLearningRate: 0.03,
iterationsLimit: null,
minWeightsUpdate: 1e6,
randomSeed: 46,
learningRateType: LearningRateType.constant
), features, labels, MetricType.accuracy);
print('Iris dataset, softmax regression: accuracy is '
'${accuracy.toStringAsFixed(2)}');
}
K nearest neighbour regression
Let's do some prediction with a wellknown nonparametric regression algorithm  k nearest neighbours. Let's take a state of the art dataset  boston housing.
As usual, import all necessary packages
import 'dart:async';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';
import 'package:xrange/zrange.dart';
and download and read the data
final data = DataFrame.fromCsv('lib/_datasets/housing.csv',
headerExists: false,
fieldDelimiter: ' ',
labelIdx: 13,
);
As you can see, the dataset is headless, that means, that there is no a descriptive line in the beginning of the file, hence we can just use the indexbased approach to point, where the outcomes column resides (13 index in our case)
Extract features and labels
// As in example above, it's needed to normalize the matrix columnwise to reach computational stability and provide
// uniform scale for all the values in the column
final features = (await data.features).mapColumns((column) => column.normalize());
final labels = await data.labels;
Create a crossvalidator instance
final validator = CrossValidator.kFold(numberOfFolds: 5);
Let the k
parameter be equal to 4
.
Assess a knn regressor with the chosen k
value using MAPE metric
final error = validator.evaluate((trainFeatures, trainLabels) =>
ParameterlessRegressor.knn(trainFeatures, trainLabels, k: 4), features, labels, MetricType.mape);
Let's print our error
print('MAPE error on kfold validation: ${error.toStringAsFixed(2)}%'); // it yields approx. 6.18
Contacts
If you have questions, feel free to write me on