MatthiasPicard
/

ModernBERT_frugal_88k

Safetensors

modernbert

Model card Files Files and versions Community

MatthiasPicard commited on 20 days ago

Commit

03dbbef

verified ·

1 Parent(s): 09d85a1

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -9

README.md CHANGED Viewed

@@ -2,25 +2,35 @@
 This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.
-Hyper parameters for reproduction
-per_device_train_batch_size=16
-per_device_eval_batch_size=16,
-num_train_epochs=2,
-warmup_steps=500,
-weight_decay=0.01,
-learning_rate=2e-6,
-lr_scheduler_type="cosine",
-fp16=True
 Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.
 **Other attempted methods**
 What we also attempted within the challenge :
 TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting
 Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting
 Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting
 A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting
 Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting
 Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI
 Overfitting always yielded (obviously) high performance increase.
 Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.

 This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.
+- Hyper parameters for reproduction
+- per_device_train_batch_size=16
+- per_device_eval_batch_size=16,
+- num_train_epochs=2,
+- warmup_steps=500,
+- weight_decay=0.01,
+- learning_rate=2e-6,
+- lr_scheduler_type="cosine",
+- fp16=True
+-
 Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.
 **Other attempted methods**
 What we also attempted within the challenge :
 TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting
 Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting
 Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting
 A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting
 Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting
 Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI
 Overfitting always yielded (obviously) high performance increase.
 Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.