This is just a Test Model. Do NOT use for anything!

Continued pretrained from the nb-roberta-base.

The domain specific pretraining is done on the 102GB (Scandinavian corpus)[https://huggingface.co/datasets/NbAiLab/scandinavian].

Train for 180k steps for 128 sequences:

./run_mlm_flax_stream.py \
    --output_dir="./" \
    --model_type="roberta" \
    --config_name="./" \
    --tokenizer_name="./" \
    --model_name_or_path="./" \
    --dataset_name="NbAiLab/scandinavian" \
    --max_seq_length="128" \
    --weight_decay="0.01" \
    --per_device_train_batch_size="128" \
    --per_device_eval_batch_size="128" \
    --learning_rate="6e-5" \
    --warmup_steps="5000" \
    --overwrite_output_dir \
    --cache_dir /mnt/disks/flaxdisk/cache/ \
    --num_train_steps="180000" \
    --adam_beta1="0.9" \
    --adam_beta2="0.98" \
    --logging_steps="10000" \
    --save_steps="10000" \
    --eval_steps="10000" \
    --preprocessing_num_workers 96 \
    --auth_token True \
    --adafactor \
    --push_to_hub

Train for 20k steps for 512 sequences:

./run_mlm_flax_stream.py \
    --output_dir="./" \
    --model_type="roberta" \
    --config_name="./" \
    --tokenizer_name="./" \
    --model_name_or_path="./" \
    --dataset_name="NbAiLab/scandinavian" \
    --max_seq_length="512" \
    --weight_decay="0.01" \
    --per_device_train_batch_size="48" \
    --per_device_eval_batch_size="48" \
    --learning_rate="3e-5" \
    --warmup_steps="5000" \
    --overwrite_output_dir \
    --cache_dir /mnt/disks/flaxdisk/cache/ \
    --num_train_steps="20000" \
    --adam_beta1="0.9" \
    --adam_beta2="0.98" \
    --logging_steps="20000" \
    --save_steps="10000" \
    --eval_steps="10000" \
    --preprocessing_num_workers 96 \
    --auth_token True \
    --adafactor \
    --push_to_hub

Approximate additional training time: 1 week.

Downloads last month
32
Safetensors
Model size
125M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.