Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ tags:
|
|
| 8 |
- biology
|
| 9 |
---
|
| 10 |
Chemma-2B is a continually pretrained [gemma-2b](https://huggingface.co/google/gemma-2b) model for organic molecules.
|
| 11 |
-
It is pretrained on
|
| 12 |
(molecular weight, synthetic accessibility score, drug-likeness etc.)
|
| 13 |
and similarities (Tanimoto distance between ECFP fingerprints).
|
| 14 |
|
|
@@ -19,10 +19,10 @@ Example prompts:
|
|
| 19 |
`</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
|
| 20 |
has a 0.62 similarity score to the given molecule.
|
| 21 |
|
| 22 |
-
The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts.
|
| 23 |
|
| 24 |
-
A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on
|
| 25 |
-
and other benchmarks
|
| 26 |
|
| 27 |
Few notes:
|
| 28 |
* All queries should start with `</s>` symbol.
|
|
|
|
| 8 |
- biology
|
| 9 |
---
|
| 10 |
Chemma-2B is a continually pretrained [gemma-2b](https://huggingface.co/google/gemma-2b) model for organic molecules.
|
| 11 |
+
It is pretrained on [40B tokens covering 110M+ molecules from PubChem](https://huggingface.co/datasets/yerevann/PubChemForLM) as well as their chemical properties
|
| 12 |
(molecular weight, synthetic accessibility score, drug-likeness etc.)
|
| 13 |
and similarities (Tanimoto distance between ECFP fingerprints).
|
| 14 |
|
|
|
|
| 19 |
`</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
|
| 20 |
has a 0.62 similarity score to the given molecule.
|
| 21 |
|
| 22 |
+
The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts. See the [code on GitHub](https://github.com/YerevaNN/ChemLactica).
|
| 23 |
|
| 24 |
+
A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on
|
| 25 |
+
Practical Molecular Optimization and other benchmarks is [available on arxiv](https://arxiv.org/abs/2407.18897).
|
| 26 |
|
| 27 |
Few notes:
|
| 28 |
* All queries should start with `</s>` symbol.
|