pipeline_tag: reinforcement-learning | |
tags: | |
- ppo | |
PPO agents trained in a selfplay settings. The agent were trained on observation as left player only. This repo include checkpoints collected during training for | |
4 experiments: | |
- Shared weights for actor and critic | |
- No shared weights | |
- Resume training for extra steps for both shared and no shared setup | |
Please check our [wandb report](https://wandb.ai/dumas/SPAR_RL_ELK/) for more details |