Text Generation
Transformers
PyTorch
Safetensors
English
mistral
text-generation-inference

Clarify Model Description and Add Project Page Link

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -1,20 +1,20 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - togethercomputer/RedPajama-Data-1T
5
  language:
6
  - en
7
- pipeline_tag: text-generation
8
  library_name: transformers
 
 
9
  ---
10
 
11
  ## PDS-470M
12
 
13
- [paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection)
14
 
15
- **PDS-470M** is a 470M model with [Mistral](https://arxiv.org/abs/2310.06825) achitecture pre-trained from scratch on the data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data), using the PDS framework.
16
 
17
- The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, which not only enjoy strong theoretical support but is also scalable for training large language models.
18
 
19
  Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
20
 
@@ -51,4 +51,4 @@ PDS-selected data improves the performance of language models pre-trained from s
51
  journal={arXiv preprint arXiv:2410.07064},
52
  year={2024}
53
  }
54
- ```
 
1
  ---
 
2
  datasets:
3
  - togethercomputer/RedPajama-Data-1T
4
  language:
5
  - en
 
6
  library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
  ## PDS-470M
12
 
13
+ [paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection) | [project page](https://github.com/microsoft/LMOps/tree/main/data_selection)
14
 
15
+ **PDS-470M** is a 470M parameter Mistral architecture model **pretrained from scratch** using the PDS framework on data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data).
16
 
17
+ The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, offering strong theoretical support and scalability for training large language models.
18
 
19
  Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
20
 
 
51
  journal={arXiv preprint arXiv:2410.07064},
52
  year={2024}
53
  }
54
+ ```