moonshotai
/

Moonlight-16B-A3B-Instruct

Text Generation

Transformers

Safetensors

deepseek_v3

conversational

custom_code

Model card Files Files and versions Community

liushaowei commited on 3 days ago

Commit

579a3a3

1 Parent(s): 4308c42

update readme

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -3,14 +3,14 @@ license: mit
 library_name: transformers
 ---
 <div align="center">
-  <a href="https://github.com/MoonshotAI/dummy.pdf"><img width="80%" src="figures/banner.png"></a>
 </div>
 <!-- # Muon is Scalable For LLM Training -->
 <div align="center">
-  <a href="https://github.com/MoonshotAI/dummy.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a>  |
-  <a href="https://huggingface.co/moonshotai/Moonlight"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a>  |
   <a href="#"><img src="figures/megatron.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;">Megatron(coming soon)</b></a>
 </div>
@@ -85,8 +85,8 @@ We compared Moonlight with SOTA public models at similar scale:
 | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
 | :------------: | :------------: | :------------: | :------------: | :------------: |
-| Moonlight | 16B | 3B | 8K   | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight)   |
-| Moonlight-Instruct  | 16B | 3B |  8K   | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight-Instruct)   |
 </div>
@@ -94,7 +94,7 @@ We compared Moonlight with SOTA public models at similar scale:
 We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and the latest version of transformers as the development environment.
-For our pretrained model (Moonlight):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -113,7 +113,7 @@ generated_ids = model.generate(**inputs, max_new_tokens=100)
 response = tokenizer.batch_decode(generated_ids)[0]
 ```
-For our instruct model (Moonlight-Instruct):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer

 library_name: transformers
 ---
 <div align="center">
+  <a href="https://github.com/MoonshotAI/Moonlight"><img width="80%" src="figures/banner.png"></a>
 </div>
 <!-- # Muon is Scalable For LLM Training -->
 <div align="center">
+  <a href="https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a>  |
+  <a href="https://huggingface.co/moonshotai/Moonlight-16B-A3B"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a>  |
   <a href="#"><img src="figures/megatron.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;">Megatron(coming soon)</b></a>
 </div>
 | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
 | :------------: | :------------: | :------------: | :------------: | :------------: |
+| Moonlight-16B-A3B | 16B | 3B | 8K   | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight-16B-A3B)   |
+| Moonlight-16B-A3B-Instruct  | 16B | 3B |  8K   | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct)   |
 </div>
 We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and the latest version of transformers as the development environment.
+For our pretrained model (Moonlight-16B-A3B):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 response = tokenizer.batch_decode(generated_ids)[0]
 ```
+For our instruct model (Moonlight-16B-A3B-Instruct):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer