AdamW class

Implements the AdamW optimizer.

AdamW is a variant of the Adam optimizer that improves its handling of L2 regularization (weight decay). In standard Adam, weight decay is coupled with the adaptive learning rate, which can lead to suboptimal performance. AdamW decouples the weight decay from the gradient-based update, applying it directly to the weights.

This often leads to better model generalization and has become the default optimizer for training large models like Transformers.

Analogy 🧠

Think of training as driving a car and weight decay as a tax:

Adam: The tax is bundled with your fuel cost. When you accelerate hard (large gradients), your tax also increases.
AdamW: The tax is paid separately, based only on your weight values, regardless of your speed (gradient).

Example

var optimizer = AdamW(model.parameters, learningRate: 0.001, weightDecay: 0.01);

Properties

beta1 → double

final

beta2 → double

final

epsilon → double

final

hashCode → int

The hash code for this object.

no setterinherited

learningRate → double

The step size for the gradient updates.

finalinherited

parameters → List<Tensor>

The list of model parameters (weights and biases) that this optimizer will update.

finalinherited

runtimeType → Type

A representation of the runtime type of the object.

no setterinherited

weightDecay → double

final

Methods

noSuchMethod(Invocation invocation) → dynamic

Invoked when a nonexistent method or property is accessed.

inherited

step() → void

Performs a single optimization step according to the AdamW update rule.

override

toString() → String

A string representation of this object.

inherited

zeroGrad() → void

Resets the gradients of all parameters to zero.

inherited

Analogy 🧠

Example

Constructors

Properties

Methods

Operators

adamw library