Butanium's picture
Update README.md
9e0babc
|
raw
history blame
450 Bytes
---
pipeline_tag: reinforcement-learning
tags:
- ppo
---
PPO agents trained in a selfplay settings. The agent were trained on observation as left player only. This repo include checkpoints collected during training for
4 experiments:
- Shared weights for actor and critic
- No shared weights
- Resume training for extra steps for both shared and no shared setup
Please check our [wandb report](https://wandb.ai/dumas/SPAR_RL_ELK/) for more details