LogisticRegressor.SGD constructor

LogisticRegressor.SGD(

DataFrame trainingData,
String targetName, {
required LearningRateType learningRateType,
int iterationsLimit = iterationLimitDefaultValue,
double initialLearningRate = initialLearningRateDefaultValue,
double decay = decayDefaultValue,
int dropRate = dropRateDefaultValue,
double minCoefficientsUpdate = minCoefficientsUpdateDefaultValue,
double probabilityThreshold = probabilityThresholdDefaultValue,
double lambda = lambdaDefaultValue,
bool fitIntercept = fitInterceptDefaultValue,
double interceptScale = interceptScaleDefaultValue,
InitialCoefficientsType initialCoefficientsType = initialCoefficientsTypeDefaultValue,
num positiveLabel = positiveLabelDefaultValue,
num negativeLabel = negativeLabelDefaultValue,
bool collectLearningData = collectLearningDataDefaultValue,
DType dtype = dTypeDefaultValue,
Vector? initialCoefficients,
int? seed,

})

Creates a LogisticRegressor instance based on Stochastic Gradient Descent algorithm

Parameters:

trainingData Observations that will be used by the classifier to learn the coefficients. Must contain targetName column.

targetName A string that serves as a name of the target column (a column that contains class labels or outcomes for the associated features).

learningRateType A value defining a strategy for the learning rate behaviour throughout the whole fitting process.

iterationsLimit A number of fitting iterations. Uses as a condition of convergence in the optimization algorithm. Default value is 100.

initialLearningRate The initial value defining velocity of the convergence of the gradient descent optimizer. Default value is 1e-3.

decay The value meaning "speed" of learning rate decrease. Applicable only for LearningRateType.timeBased, LearningRateType.stepBased, and LearningRateType.exponential strategies

dropRate The value that is used as a number of learning iterations after which the learning rate will be decreased. The value is applicable only for LearningRateType.stepBased learning rate; it will be omitted for other learning rate strategies

minCoefficientsUpdate A minimum distance between coefficient vectors in two contiguous iterations. Uses as a condition of convergence in the optimization algorithm. If a difference between the two vectors is small enough, there is no reason to continue fitting. Default value is 1e-12

probabilityThreshold A probability on the basis of which it is decided, whether an observation relates to positive class label (see positiveLabel parameter) or to negative class label (see negativeLabel parameter). The greater the probability, the more strict the classifier is. Default value is 0.5.

lambda A coefficient of regularization. Uses to prevent the regressor's overfitting. The more the value of lambda, the more regular the coefficients of the equation of the predicting hyperplane are. Extremely large lambda may decrease the coefficients to nothing, otherwise too small lambda may be a cause of too large absolute values of the coefficients, that is also bad.

seed A seed value that will be used to generate random indices to select rows from trainingData. If it's needed to get the same result every time one trains the classifier, it's needed to specify this value

fitIntercept Whether or not to fit intercept term. Default value is false. Intercept in 2-dimensional space is a bias of the line (relative to X-axis).

interceptScale A value, defining a size of the intercept.

initialCoefficientsType Defines the coefficients that will be autogenerated at the first optimization iteration. By default all the autogenerated coefficients are equal to zeroes. If initialCoefficients are provided, the parameter will be ignored

initialCoefficients Coefficients to be used in the first iteration of optimization algorithm. initialCoefficients is a vector, length of which must be equal to the number of features in trainingData : in case of logistic regression only one column from trainingData is used as a prediction target column, thus the number of features is equal to the number of columns in trainingData minus 1 (target column). Keep in mind, that if your model considers intercept term, initialCoefficients should contain an extra element in the beginning of the vector and it denotes the intercept term coefficient

positiveLabel A value that will be used for the positive class. By default, 1.

negativeLabel A value that will be used for the negative class. By default, 0.

collectLearningData Whether or not to collect learning data, for instance cost function value per each iteration. Affects performance much. If collectLearningData is true, one may access costPerIteration getter in order to evaluate learning process more thoroughly. Default value is false

dtype A data type for all the numeric values, used by the algorithm. Can affect performance or accuracy of the computations. Default value is DType.float32

Example:

import 'package:ml_algo/ml_algo.dart';
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
  final samples = getPimaIndiansDiabetesDataFrame().shuffle();
  final model = LogisticRegressor.SGD(
    samples,
    'Outcome',
    seed: 10,
    iterationsLimit: 50,
    initialLearningRate: 1e-4,
    learningRateType: LearningRateType.constant,
   );
}

Keep in mind that you need to select a proper learning rate strategy for every particular model. For more details, refer to LearningRateType, also consider decay and dropRate parameters.

Implementation

factory LogisticRegressor.SGD(
  DataFrame trainingData,
  String targetName, {
  required LearningRateType learningRateType,
  int iterationsLimit = iterationLimitDefaultValue,
  double initialLearningRate = initialLearningRateDefaultValue,
  double decay = decayDefaultValue,
  int dropRate = dropRateDefaultValue,
  double minCoefficientsUpdate = minCoefficientsUpdateDefaultValue,
  double probabilityThreshold = probabilityThresholdDefaultValue,
  double lambda = lambdaDefaultValue,
  bool fitIntercept = fitInterceptDefaultValue,
  double interceptScale = interceptScaleDefaultValue,
  InitialCoefficientsType initialCoefficientsType =
      initialCoefficientsTypeDefaultValue,
  num positiveLabel = positiveLabelDefaultValue,
  num negativeLabel = negativeLabelDefaultValue,
  bool collectLearningData = collectLearningDataDefaultValue,
  DType dtype = dTypeDefaultValue,
  Vector? initialCoefficients,
  int? seed,
}) =>
    initLogisticRegressorModule().get<LogisticRegressorFactory>().create(
          trainData: trainingData,
          targetName: targetName,
          optimizerType: LinearOptimizerType.gradient,
          iterationsLimit: iterationsLimit,
          initialLearningRate: initialLearningRate,
          decay: decay,
          dropRate: dropRate,
          minCoefficientsUpdate: minCoefficientsUpdate,
          probabilityThreshold: probabilityThreshold,
          lambda: lambda,
          regularizationType: RegularizationType.L2,
          randomSeed: seed,
          batchSize: 1,
          fitIntercept: fitIntercept,
          interceptScale: interceptScale,
          isFittingDataNormalized: false,
          learningRateType: learningRateType,
          initialCoefficientsType: initialCoefficientsType,
          initialCoefficients:
              initialCoefficients ?? Vector.empty(dtype: dtype),
          positiveLabel: positiveLabel,
          negativeLabel: negativeLabel,
          collectLearningData: collectLearningData,
          dtype: dtype,
        );

LogisticRegressor.SGD constructor

Implementation

LogisticRegressor class