Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,18 @@ This is NanoLM-0.3B-Instruct-v1.1. The model currently supports both **Chinese a
|
|
33 |
|
34 |
## Model Details
|
35 |
|
36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
Here are some examples. For reproducibility purposes, I've set `do_sample` to `False`. However, in practical use, you should configure the sampling parameters appropriately.
|
39 |
|
|
|
33 |
|
34 |
## Model Details
|
35 |
|
36 |
+
| Nano LMs | Non-emb Params | Arch | Layers | Dim | Heads | Seq Len |
|
37 |
+
| :----------: | :------------------: | :---: | :----: | :-------: | :---: | :---: |
|
38 |
+
| 25M | 15M | MistralForCausalLM | 12 | 312 | 12 |2K|
|
39 |
+
| 70M | 42M | LlamaForCausalLM | 12 | 576 | 9 |2K|
|
40 |
+
| 0.3B | 180M | Qwen2ForCausalLM | 12 | 896 | 14 |4K|
|
41 |
+
| 1B | 840M | Qwen2ForCausalLM | 18 | 1536 | 12 |4K|
|
42 |
+
|
43 |
+
The tokenizer and model architecture of NanoLM-0.3B-Instruct-v1.1 are the same as [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B), but the number of layers has been reduced from 24 to 12.
|
44 |
+
|
45 |
+
As a result, NanoLM-0.3B-Instruct-v1.1 has only 0.3 billion parameters, with approximately **180 million non-embedding parameters**.
|
46 |
+
|
47 |
+
Despite this, NanoLM-0.3B-Instruct-v1.1 still demonstrates strong instruction-following capabilities.
|
48 |
|
49 |
Here are some examples. For reproducibility purposes, I've set `do_sample` to `False`. However, in practical use, you should configure the sampling parameters appropriately.
|
50 |
|