AdamW class

Implements the AdamW optimizer.

AdamW is a variant of the Adam optimizer that improves its handling of L2 regularization (weight decay). In standard Adam, weight decay is coupled with the adaptive learning rate, which can lead to suboptimal performance. AdamW decouples the weight decay from the gradient-based update, applying it directly to the weights.

This often leads to better model generalization and has become the default optimizer for training large models like Transformers.

Analogy 🧠

Think of training as driving a car and weight decay as a tax:

  • Adam: The tax is bundled with your fuel cost. When you accelerate hard (large gradients), your tax also increases.
  • AdamW: The tax is paid separately, based only on your weight values, regardless of your speed (gradient).

Example

var optimizer = AdamW(model.parameters, learningRate: 0.001, weightDecay: 0.01);
Inheritance

Constructors

AdamW(List<Tensor> parameters, {required double learningRate, double beta1 = 0.9, double beta2 = 0.999, double epsilon = 1e-8, double weightDecay = 0.01})

Properties

beta1 → double
final
beta2 → double
final
epsilon → double
final
hashCode → int
The hash code for this object.
no setterinherited
learningRate → double
The step size for the gradient updates.
finalinherited
parameters → List<Tensor>
The list of model parameters (weights and biases) that this optimizer will update.
finalinherited
runtimeType → Type
A representation of the runtime type of the object.
no setterinherited
weightDecay → double
final

Methods

noSuchMethod(Invocation invocation) → dynamic
Invoked when a nonexistent method or property is accessed.
inherited
step() → void
Performs a single optimization step according to the AdamW update rule.
override
toString() → String
A string representation of this object.
inherited
zeroGrad() → void
Resets the gradients of all parameters to zero.
inherited

Operators

operator ==(Object other) → bool
The equality operator.
inherited