A small version of DeBERTa trained on the clean version of google C4 dataset. For more info about the size of the model, see config.json.

The model has been trained for 100K steps with a batch size of 2048 and a sequence length of 512, for a total of 104B tokens.

The vocabulary and the tokenizer are the same as microsoft/deberta-base.

Downloads last month
33
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train lucadiliello/deberta-small