update method

void update(
  1. int state,
  2. int action,
  3. double reward,
  4. int nextState,
)

Perform one Q-Learning update and advance internal schedules.

Implementation

void update(int state, int action, double reward, int nextState) {
  final q = qTable[state][action];
  final maxNext = maxQ(nextState);
  qTable[state][action] = q + alpha * (reward + gamma * maxNext - q);
  _steps += 1;
  _applySchedules();
}