powerpuf-bot
/

mdeberta-v3-th-wiki-qa_hyp-params

+---
+license: mit
+base_model: timpal0l/mdeberta-v3-base-squad2
+tags:
+- generated_from_trainer
+model-index:
+- name: model1
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# model1
+This model is a fine-tuned version of [timpal0l/mdeberta-v3-base-squad2](https://huggingface.co/timpal0l/mdeberta-v3-base-squad2) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 2.4437
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1.2922909480977358e-06
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
+- lr_scheduler_type: linear
+- training_steps: 500
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 4.2146        | 0.0   | 10   | 4.1627          |
+| 4.1667        | 0.0   | 20   | 3.9059          |
+| 4.0555        | 0.01  | 30   | 3.7982          |
+| 3.9331        | 0.01  | 40   | 3.7342          |
+| 3.8012        | 0.01  | 50   | 3.6719          |
+| 3.7713        | 0.01  | 60   | 3.6077          |
+| 3.8391        | 0.02  | 70   | 3.5548          |
+| 3.8842        | 0.02  | 80   | 3.5134          |
+| 3.6894        | 0.02  | 90   | 3.4823          |
+| 3.5359        | 0.02  | 100  | 3.4466          |
+| 3.6247        | 0.03  | 110  | 3.4096          |
+| 3.6347        | 0.03  | 120  | 3.3807          |
+| 3.5752        | 0.03  | 130  | 3.3459          |
+| 3.467         | 0.03  | 140  | 3.2778          |
+| 3.6188        | 0.04  | 150  | 3.2198          |
+| 3.444         | 0.04  | 160  | 3.1880          |
+| 3.4635        | 0.04  | 170  | 3.1494          |
+| 3.3998        | 0.04  | 180  | 3.1107          |
+| 3.1465        | 0.04  | 190  | 3.0675          |
+| 3.4321        | 0.05  | 200  | 3.0380          |
+| 3.3174        | 0.05  | 210  | 3.0122          |
+| 3.6018        | 0.05  | 220  | 2.9566          |
+| 3.4178        | 0.05  | 230  | 2.9099          |
+| 3.2037        | 0.06  | 240  | 2.8755          |
+| 3.3974        | 0.06  | 250  | 2.8493          |
+| 3.109         | 0.06  | 260  | 2.8209          |
+| 3.1127        | 0.06  | 270  | 2.7751          |
+| 3.2408        | 0.07  | 280  | 2.7458          |
+| 3.274         | 0.07  | 290  | 2.7211          |
+| 3.0695        | 0.07  | 300  | 2.6946          |
+| 2.9757        | 0.07  | 310  | 2.6713          |
+| 3.0846        | 0.08  | 320  | 2.6415          |
+| 3.0576        | 0.08  | 330  | 2.6209          |
+| 2.8623        | 0.08  | 340  | 2.6041          |
+| 3.165         | 0.08  | 350  | 2.5913          |
+| 2.8874        | 0.08  | 360  | 2.5797          |
+| 3.046         | 0.09  | 370  | 2.5627          |
+| 2.8727        | 0.09  | 380  | 2.5391          |
+| 2.7942        | 0.09  | 390  | 2.5188          |
+| 2.7494        | 0.09  | 400  | 2.5031          |
+| 2.8419        | 0.1   | 410  | 2.4905          |
+| 2.8411        | 0.1   | 420  | 2.4792          |
+| 2.9188        | 0.1   | 430  | 2.4696          |
+| 2.9239        | 0.1   | 440  | 2.4622          |
+| 3.0064        | 0.11  | 450  | 2.4551          |
+| 2.9781        | 0.11  | 460  | 2.4504          |
+| 2.8582        | 0.11  | 470  | 2.4483          |
+| 2.8701        | 0.11  | 480  | 2.4456          |
+| 2.7012        | 0.12  | 490  | 2.4442          |
+| 2.827         | 0.12  | 500  | 2.4437          |
+### Framework versions
+- Transformers 4.35.2
+- Pytorch 2.1.0+cu121
+- Datasets 2.15.0
+- Tokenizers 0.15.0