MatthiasPicard commited on
Commit
03dbbef
·
verified ·
1 Parent(s): 09d85a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -9
README.md CHANGED
@@ -2,25 +2,35 @@
2
 
3
  This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.
4
 
5
- Hyper parameters for reproduction
6
- per_device_train_batch_size=16
7
- per_device_eval_batch_size=16,
8
- num_train_epochs=2,
9
- warmup_steps=500,
10
- weight_decay=0.01,
11
- learning_rate=2e-6,
12
- lr_scheduler_type="cosine",
13
- fp16=True
 
14
  Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.
15
 
16
  **Other attempted methods**
17
 
 
18
  What we also attempted within the challenge :
 
19
  TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting
 
20
  Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting
 
21
  Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting
 
22
  A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting
 
23
  Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting
 
24
  Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI
 
25
  Overfitting always yielded (obviously) high performance increase.
 
26
  Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.
 
2
 
3
  This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.
4
 
5
+ - Hyper parameters for reproduction
6
+ - per_device_train_batch_size=16
7
+ - per_device_eval_batch_size=16,
8
+ - num_train_epochs=2,
9
+ - warmup_steps=500,
10
+ - weight_decay=0.01,
11
+ - learning_rate=2e-6,
12
+ - lr_scheduler_type="cosine",
13
+ - fp16=True
14
+ -
15
  Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.
16
 
17
  **Other attempted methods**
18
 
19
+
20
  What we also attempted within the challenge :
21
+
22
  TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting
23
+
24
  Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting
25
+
26
  Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting
27
+
28
  A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting
29
+
30
  Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting
31
+
32
  Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI
33
+
34
  Overfitting always yielded (obviously) high performance increase.
35
+
36
  Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.