Data-Selection
/

PDS-470M

@@ -1,20 +1,20 @@
 ---
-license: apache-2.0
 datasets:
 - togethercomputer/RedPajama-Data-1T
 language:
 - en
-pipeline_tag: text-generation
 library_name: transformers
 ---
 ## PDS-470M
-[paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection)
-**PDS-470M** is a 470M model with [Mistral](https://arxiv.org/abs/2310.06825) achitecture pre-trained from scratch on the data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data), using the PDS framework.
-The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, which not only enjoy strong theoretical support but is also scalable for training large language models.
 Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
@@ -51,4 +51,4 @@ PDS-selected data improves the performance of language models pre-trained from s
   journal={arXiv preprint arXiv:2410.07064},
   year={2024}
 }
-```

 ---
 datasets:
 - togethercomputer/RedPajama-Data-1T
 language:
 - en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 ---
 ## PDS-470M
+[paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection) | [project page](https://github.com/microsoft/LMOps/tree/main/data_selection)
+**PDS-470M** is a 470M parameter Mistral architecture model **pretrained from scratch** using the PDS framework on data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data).
+The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, offering strong theoretical support and scalability for training large language models.
 Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
   journal={arXiv preprint arXiv:2410.07064},
   year={2024}
 }
+```