MachineLearning/policy_gradient library

Policy Gradient (REINFORCE) - minimal numeric implementation

A concise REINFORCE-style policy gradient agent that uses an MLP policy to produce action probabilities for discrete actions. This implementation is intentionally small for teaching and unit testing (no baselines by default) but exposes the core API needed in real experiments: selectAction and updateFromEpisode.

Classes

PolicyGradient
REINFORCE-style Policy Gradient with optional baseline and normalization