afrideva commited on
Commit
397ea63
1 Parent(s): 2627d73

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: amazingvince/zephyr-smol_llama-100m-sft-full
3
+ inference: false
4
+ license: apache-2.0
5
+ model-index:
6
+ - name: zephyr-smol_llama-100m-sft-full
7
+ results: []
8
+ model_creator: amazingvince
9
+ model_name: zephyr-smol_llama-100m-sft-full
10
+ pipeline_tag: text-generation
11
+ quantized_by: afrideva
12
+ tags:
13
+ - generated_from_trainer
14
+ - gguf
15
+ - ggml
16
+ - quantized
17
+ - q2_k
18
+ - q3_k_m
19
+ - q4_k_m
20
+ - q5_k_m
21
+ - q6_k
22
+ - q8_0
23
+ ---
24
+ # amazingvince/zephyr-smol_llama-100m-sft-full-GGUF
25
+
26
+ Quantized GGUF model files for [zephyr-smol_llama-100m-sft-full](https://huggingface.co/amazingvince/zephyr-smol_llama-100m-sft-full) from [amazingvince](https://huggingface.co/amazingvince)
27
+
28
+
29
+ | Name | Quant method | Size |
30
+ | ---- | ---- | ---- |
31
+ | [zephyr-smol_llama-100m-sft-full.fp16.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.fp16.gguf) | fp16 | 204.25 MB |
32
+ | [zephyr-smol_llama-100m-sft-full.q2_k.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf) | q2_k | 51.90 MB |
33
+ | [zephyr-smol_llama-100m-sft-full.q3_k_m.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q3_k_m.gguf) | q3_k_m | 58.04 MB |
34
+ | [zephyr-smol_llama-100m-sft-full.q4_k_m.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q4_k_m.gguf) | q4_k_m | 66.38 MB |
35
+ | [zephyr-smol_llama-100m-sft-full.q5_k_m.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q5_k_m.gguf) | q5_k_m | 75.31 MB |
36
+ | [zephyr-smol_llama-100m-sft-full.q6_k.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q6_k.gguf) | q6_k | 84.80 MB |
37
+ | [zephyr-smol_llama-100m-sft-full.q8_0.gguf](https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q8_0.gguf) | q8_0 | 109.33 MB |
38
+
39
+
40
+
41
+ ## Original Model Card:
42
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
43
+ should probably proofread and complete it, then remove this comment. -->
44
+
45
+ # zephyr-smol_llama-100m-sft-full
46
+
47
+ This model is a fine-tuned version of [BEE-spoke-data/smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA) on an unknown dataset.
48
+ It achieves the following results on the evaluation set:
49
+ - Loss: 1.9579
50
+
51
+ ## Model description
52
+
53
+ More information needed
54
+
55
+ ## Intended uses & limitations
56
+
57
+ More information needed
58
+
59
+ ## Training and evaluation data
60
+
61
+ More information needed
62
+
63
+ ## Training procedure
64
+
65
+ ### Training hyperparameters
66
+
67
+ The following hyperparameters were used during training:
68
+ - learning_rate: 2e-05
69
+ - train_batch_size: 16
70
+ - eval_batch_size: 16
71
+ - seed: 42
72
+ - distributed_type: multi-GPU
73
+ - num_devices: 2
74
+ - gradient_accumulation_steps: 4
75
+ - total_train_batch_size: 128
76
+ - total_eval_batch_size: 32
77
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
78
+ - lr_scheduler_type: cosine
79
+ - num_epochs: 1
80
+
81
+ ### Training results
82
+
83
+ | Training Loss | Epoch | Step | Validation Loss |
84
+ |:-------------:|:-----:|:----:|:---------------:|
85
+ | 1.9642 | 0.7 | 1141 | 1.9578 |
86
+
87
+
88
+ ### Framework versions
89
+
90
+ - Transformers 4.35.0
91
+ - Pytorch 2.1.0
92
+ - Datasets 2.14.6
93
+ - Tokenizers 0.14.1