osiria
/

blaze-it

@@ -32,19 +32,19 @@ widget:
 <h3>Introduction</h3>
-This model is a <b>lightweight</b> and uncased version of <b>BERT</b> <b>[1]</b> for the <b>italian</b> language. Its <b>55M parameters</b> and <b>220MB</b> size make it
 <b>50% lighter</b> than a typical mono-lingual BERT model. It is ideal when memory consumption and execution speed are critical while maintaining high-quality results.
 <h3>Model description</h3>
 The model builds on the multilingual <b>DistilBERT</b> <b>[2]</b> model (from the HuggingFace team: [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased)) as a starting point,
-focusing it on the italian language while at the same time turning it into an uncased model by modifying the embedding layer
 (as in <b>[3]</b>, but computing document-level frequencies over the <b>Wikipedia</b> dataset and setting a frequency threshold of 0.1%), which brings a considerable
 reduction in the number of parameters.
 To compensate for the deletion of cased tokens, which now forces the model to exploit lowercase representations of words previously capitalized,
-the model has been further pre-trained on the italian split of the [Wikipedia](https://huggingface.co/datasets/wikipedia) dataset, using the <b>whole word masking [4]</b> technique to make it more robust
 to the new uncased representations.
 The resulting model has 55M parameters, a vocabulary of 13.832 tokens, and a size of 220MB, which makes it <b>50% lighter</b> than a typical mono-lingual BERT model and
@@ -53,7 +53,7 @@ The resulting model has 55M parameters, a vocabulary of 13.832 tokens, and a siz
 <h3>Training procedure</h3>
-The model has been trained for <b>masked language modeling</b> on the italian <b>Wikipedia</b> (~3GB) dataset for 10K steps, using the AdamW optimizer, with a batch size of 512
 (obtained through 128 gradient accumulation steps),
 a sequence length of 512, and a linearly decaying learning rate starting from 5e-5. The training has been performed using <b>dynamic masking</b> between epochs and
 exploiting the <b>whole word masking</b> technique.

 <h3>Introduction</h3>
+This model is a <b>lightweight</b> and uncased version of <b>BERT</b> <b>[1]</b> for the <b>Italian</b> language. Its <b>55M parameters</b> and <b>220MB</b> size make it
 <b>50% lighter</b> than a typical mono-lingual BERT model. It is ideal when memory consumption and execution speed are critical while maintaining high-quality results.
 <h3>Model description</h3>
 The model builds on the multilingual <b>DistilBERT</b> <b>[2]</b> model (from the HuggingFace team: [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased)) as a starting point,
+focusing it on the Italian language while at the same time turning it into an uncased model by modifying the embedding layer
 (as in <b>[3]</b>, but computing document-level frequencies over the <b>Wikipedia</b> dataset and setting a frequency threshold of 0.1%), which brings a considerable
 reduction in the number of parameters.
 To compensate for the deletion of cased tokens, which now forces the model to exploit lowercase representations of words previously capitalized,
+the model has been further pre-trained on the Italian split of the [Wikipedia](https://huggingface.co/datasets/wikipedia) dataset, using the <b>whole word masking [4]</b> technique to make it more robust
 to the new uncased representations.
 The resulting model has 55M parameters, a vocabulary of 13.832 tokens, and a size of 220MB, which makes it <b>50% lighter</b> than a typical mono-lingual BERT model and
 <h3>Training procedure</h3>
+The model has been trained for <b>masked language modeling</b> on the Italian <b>Wikipedia</b> (~3GB) dataset for 10K steps, using the AdamW optimizer, with a batch size of 512
 (obtained through 128 gradient accumulation steps),
 a sequence length of 512, and a linearly decaying learning rate starting from 5e-5. The training has been performed using <b>dynamic masking</b> between epochs and
 exploiting the <b>whole word masking</b> technique.