MachineLearning/policy_gradient library
Policy Gradient (REINFORCE) - minimal numeric implementation
A concise REINFORCE-style policy gradient agent that uses an MLP policy to
produce action probabilities for discrete actions. This implementation is
intentionally small for teaching and unit testing (no baselines by default)
but exposes the core API needed in real experiments: selectAction and
updateFromEpisode.
Classes
- PolicyGradient
- REINFORCE-style Policy Gradient with optional baseline and normalization