update method
Perform one Q-Learning update and advance internal schedules.
Implementation
void update(int state, int action, double reward, int nextState) {
final q = qTable[state][action];
final maxNext = maxQ(nextState);
qTable[state][action] = q + alpha * (reward + gamma * maxNext - q);
_steps += 1;
_applySchedules();
}