AuriAetherwiing commited on
Commit
65d88c5
1 Parent(s): 204b2bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -67
README.md CHANGED
@@ -1,7 +1,19 @@
1
  ---
2
  library_name: transformers
3
  license: other
 
 
4
  base_model: Qwen/Qwen2.5-72B
 
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -9,10 +21,55 @@ model-index:
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  <details><summary>See axolotl config</summary>
17
 
18
  axolotl version: `0.4.1`
@@ -422,66 +479,4 @@ weight_decay: 0.1
422
  # fsdp_mixed_precision: BF16 # Added
423
  ```
424
 
425
- </details><br>
426
-
427
- # EVA-Qwen2.5-72B-SFFT-v0.0
428
-
429
- This model is a fine-tuned version of [Qwen/Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B) on the None dataset.
430
- It achieves the following results on the evaluation set:
431
- - Loss: 3.2818
432
-
433
- ## Model description
434
-
435
- More information needed
436
-
437
- ## Intended uses & limitations
438
-
439
- More information needed
440
-
441
- ## Training and evaluation data
442
-
443
- More information needed
444
-
445
- ## Training procedure
446
-
447
- ### Training hyperparameters
448
-
449
- The following hyperparameters were used during training:
450
- - learning_rate: 5e-05
451
- - train_batch_size: 4
452
- - eval_batch_size: 4
453
- - seed: 42
454
- - distributed_type: multi-GPU
455
- - num_devices: 8
456
- - gradient_accumulation_steps: 4
457
- - total_train_batch_size: 128
458
- - total_eval_batch_size: 32
459
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
460
- - lr_scheduler_type: cosine
461
- - lr_scheduler_warmup_steps: 20
462
- - num_epochs: 3
463
-
464
- ### Training results
465
-
466
- | Training Loss | Epoch | Step | Validation Loss |
467
- |:-------------:|:------:|:----:|:---------------:|
468
- | 1.3286 | 0.0142 | 1 | 2.9734 |
469
- | 1.0713 | 0.2562 | 18 | 3.7951 |
470
- | 0.9051 | 0.5125 | 36 | 3.3342 |
471
- | 0.8746 | 0.7687 | 54 | 3.2625 |
472
- | 0.6216 | 1.0214 | 72 | 3.2244 |
473
- | 0.6158 | 1.2786 | 90 | 3.2810 |
474
- | 0.57 | 1.5357 | 108 | 3.2375 |
475
- | 0.5213 | 1.7929 | 126 | 3.1606 |
476
- | 0.3178 | 2.0427 | 144 | 3.2384 |
477
- | 0.2809 | 2.2989 | 162 | 3.2971 |
478
- | 0.3067 | 2.5552 | 180 | 3.2886 |
479
- | 0.3005 | 2.8114 | 198 | 3.2818 |
480
-
481
-
482
- ### Framework versions
483
-
484
- - Transformers 4.45.2
485
- - Pytorch 2.5.0+rocm6.1
486
- - Datasets 3.0.1
487
- - Tokenizers 0.20.1
 
1
  ---
2
  library_name: transformers
3
  license: other
4
+ license_name: qwen
5
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
6
  base_model: Qwen/Qwen2.5-72B
7
+ datasets:
8
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
9
+ - Nopm/Opus_WritingStruct
10
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
11
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
12
+ - Gryphe/ChatGPT-4o-Writing-Prompts
13
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
14
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
15
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
16
+ - allura-org/Celeste-1.x-data-mixture
17
  tags:
18
  - generated_from_trainer
19
  model-index:
 
21
  results: []
22
  ---
23
 
24
+ # EVA Qwen2.5-72B v0.0
25
+
26
+ <p>
27
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data.<br>
28
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
29
+ </p>
30
+
31
+ <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>
32
+
33
+ <p>
34
+ <p>Prompt format is ChatML.</p><br>
35
+ <h3>Recommended sampler values:</h3>
36
+ <ul>
37
+ <li>Temperature: 1</li>
38
+ <li>Typical-P: 0.9</li>
39
+ <li>Min-P: 0.05</li>
40
+ <li>Top-A: 0.2</li>
41
+ <li>Repetition Penalty: 1.03</li>
42
+ </ul>
43
+
44
+ <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
45
+
46
+ - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
47
+ - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
48
+ </p>
49
+
50
+ <p>
51
+ <br>
52
+ <h3>
53
+ Training data:
54
+ </h3>
55
+ <ul>
56
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
57
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
58
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
59
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
60
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
61
+ </ul>
62
+ <h3>
63
+ Training time and hardware:
64
+ </h3>
65
+ <ul><li>12 hours on 8xMI300X</li></ul><br>
66
+ </p>
67
+ <p>Model was trained by Kearm and Auri.</p>
68
+ <h4>Special thanks:</h4><ul>
69
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm and Epiculous for the data</li>
70
+ <li>to CalamitiousFelicitousness for providing free inference for public beta testing</li>
71
+ <li>and to Allura-org for support and feedback on EVA models.</li></ul>
72
+ <a href=https://github.com/axolotl-ai-cloud/axolotl><img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/></a>
73
  <details><summary>See axolotl config</summary>
74
 
75
  axolotl version: `0.4.1`
 
479
  # fsdp_mixed_precision: BF16 # Added
480
  ```
481
 
482
+ </details><br>