Model Card for Model ID
- Summary Length PPO experiment #6.1
- No KL divergence in loss
- Loss = -P*R/sum(P), instead of Loss = (log(P)*R).mean()
Model Details
- Dataset size: 1024
- Epochs: 1
- Batch Size: 4 * 4 (w/ 4 GPUs) * 8 (w/ Gradient Accumulation)
Optimizer args: Torch AdamW default, except
- LR = 0.00001
- Downloads last month
- 90
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.