YAML Metadata
Error:
"language[0]" must only contain lowercase characters
YAML Metadata
Error:
"language[0]" with value "English" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.
roberta-base for MLM
Objective: To make a Roberta Base for the Movie Domain by using various Movie Datasets as simple text for Masked Language Modeling. This is the Movie Roberta to be used in Movie Domain applications.
model_name = "thatdramebaazguy/movie-roberta-base"
pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="Fill-Mask")
Overview
Language model: roberta-base
Language: English
Downstream-task: Fill-Mask
Training data: imdb, polarity movie data, cornell_movie_dialogue, 25mlens movie names
Eval data: imdb, polarity movie data, cornell_movie_dialogue, 25mlens movie names
Infrastructure: 4x Tesla v100
Code: See example
Hyperparameters
Num examples = 4767233
Num Epochs = 2
Instantaneous batch size per device = 20
Total train batch size (w. parallel, distributed & accumulation) = 80
Gradient Accumulation steps = 1
Total optimization steps = 119182
eval_loss = 1.6153
eval_samples = 20573
perplexity = 5.0296
learning_rate=5e-05
n_gpu = 4
Performance
perplexity = 5.0296
Some of my work:
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.