AdamW class
Implements the AdamW optimizer.
AdamW is a variant of the Adam optimizer that improves its handling of L2 regularization (weight decay). In standard Adam, weight decay is coupled with the adaptive learning rate, which can lead to suboptimal performance. AdamW decouples the weight decay from the gradient-based update, applying it directly to the weights.
This often leads to better model generalization and has become the default optimizer for training large models like Transformers.
Analogy ðŸ§
Think of training as driving a car and weight decay as a tax:
- Adam: The tax is bundled with your fuel cost. When you accelerate hard (large gradients), your tax also increases.
- AdamW: The tax is paid separately, based only on your weight values, regardless of your speed (gradient).
Example
var optimizer = AdamW(model.parameters, learningRate: 0.001, weightDecay: 0.01);
Constructors
Properties
- beta1 → double
-
final
- beta2 → double
-
final
- epsilon → double
-
final
- hashCode → int
-
The hash code for this object.
no setterinherited
- learningRate → double
-
The step size for the gradient updates.
finalinherited
-
parameters
→ List<
Tensor> -
The list of model parameters (weights and biases) that this optimizer will update.
finalinherited
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- weightDecay → double
-
final
Methods
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
step(
) → void -
Performs a single optimization step according to the AdamW update rule.
override
-
toString(
) → String -
A string representation of this object.
inherited
-
zeroGrad(
) → void -
Resets the gradients of all parameters to zero.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited