Butanium's picture
Update README.md
960c752
|
raw
history blame
363 Bytes
metadata
pipeline_tag: reinforcement-learning
tags:
  - ppo

PPO agents trained in a selfplay settings. The agent were trained on observation as left player only. This repo include checkpoints collected during training for 4 experiments:

  • Shared weights for actor and critic
  • No shared weights
  • Resume training for extra steps for both shared and no shared setup