karimouda commited on
Commit
869356e
1 Parent(s): a61a78e

Update README.md

Browse files

Some word and spelling changes

Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -363,7 +363,7 @@ This model was fine-tuned via 2 phases:
363
 
364
  In phase `1`, we curated a dataset [silma-ai/silma-arabic-triplets-dataset-v1.0](https://huggingface.co/datasets/silma-ai/silma-arabic-triplets-dataset-v1.0) which
365
  contains more than `2.25M` records of (anchor, positive and negative) Arabic/English samples.
366
- Only the first `600` samples were taken to be the `eval` dataset, while the rest was used for fine-tuning.
367
 
368
  Phase `1` produces a finetuned `Matryoshka` model based on [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) with the following hyperparameters:
369
 
@@ -376,7 +376,7 @@ Phase `1` produces a finetuned `Matryoshka` model based on [aubmindlab/bert-base
376
  - `optim`: adamw_torch_fused
377
  - `batch_sampler`: no_duplicates
378
 
379
- **[trainin-example](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_sts.py)**
380
 
381
 
382
  ### Phase 2:
@@ -385,7 +385,7 @@ In phase `2`, we curated a dataset [silma-ai/silma-arabic-english-sts-dataset-v1
385
  contains more than `30k` records of (sentence1, sentence2 and similarity-score) Arabic/English samples.
386
  Only the first `100` samples were taken to be the `eval` dataset, while the rest was used for fine-tuning.
387
 
388
- Phase `1` produces a finetuned `STS` model based on the model from phase `1`, with the following hyperparameters:
389
 
390
  - `eval_strategy`: steps
391
  - `per_device_train_batch_size`: 250
@@ -397,7 +397,7 @@ Phase `1` produces a finetuned `STS` model based on the model from phase `1`, wi
397
  - `optim`: adamw_torch_fused
398
  - `batch_sampler`: no_duplicates
399
 
400
- **[trainin-example](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py)**
401
 
402
 
403
  </details>
 
363
 
364
  In phase `1`, we curated a dataset [silma-ai/silma-arabic-triplets-dataset-v1.0](https://huggingface.co/datasets/silma-ai/silma-arabic-triplets-dataset-v1.0) which
365
  contains more than `2.25M` records of (anchor, positive and negative) Arabic/English samples.
366
+ Only the first `600` samples were taken to be the `eval` dataset, while the rest were used for fine-tuning.
367
 
368
  Phase `1` produces a finetuned `Matryoshka` model based on [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) with the following hyperparameters:
369
 
 
376
  - `optim`: adamw_torch_fused
377
  - `batch_sampler`: no_duplicates
378
 
379
+ **[training script](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_sts.py)**
380
 
381
 
382
  ### Phase 2:
 
385
  contains more than `30k` records of (sentence1, sentence2 and similarity-score) Arabic/English samples.
386
  Only the first `100` samples were taken to be the `eval` dataset, while the rest was used for fine-tuning.
387
 
388
+ Phase `2` produces a finetuned `STS` model based on the model from phase `1`, with the following hyperparameters:
389
 
390
  - `eval_strategy`: steps
391
  - `per_device_train_batch_size`: 250
 
397
  - `optim`: adamw_torch_fused
398
  - `batch_sampler`: no_duplicates
399
 
400
+ **[training script](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py)**
401
 
402
 
403
  </details>