# Model documentation & parameters

**Algorithm Version**: Which model checkpoint to use (trained on different datasets).

**Scaffolds**: One or multiple scaffolds, provided as '.'-separated SMILES. If empty, no scaffolds are used. Note that this is a hard-constraint,
i.e., the scaffold will certainly be present in the generated molecule. If multiple scaffolds are given, they are paired with the seed SMILES 
(if applicable) and every molecule will be guaranteed to contain exactly one scaffold.

**Seed SMILES**: One or multiple seed molecules, provided as '.'-separated SMILES. If empty, no scaffolds are used.
There's no guarantee for a seed SMILES (or a substructure of it) to be present in the generated molecule as it's merely used for decoder initialization.

**Number of samples**: How many samples should be generated (between 1 and 50).

**Beam size**: Beam size used in beam search decoding (the higher the slower but better).

**Sigma**: Variance of the Gaussian noise that is added to the latent code (before passing to the decoder).

**Seed**: The random seed used for initialization.


# Model card

**Model Details**: MoLeR is a graph-based molecular generative model that can be conditioned (primed) on scaffolds. The model decorates scaffolds with realistic structural motifs.

**Developers**: Krzysztof Maziarz and co-authors from Microsoft Research and Novartis (full reference at bottom).

**Distributors**: Developer's code wrapped and distributed by GT4SD Team (2023) from IBM Research.

**Model date**: Released around March 2022.

**Model version**: Model provided by original authors, see [their GitHub repo](https://github.com/microsoft/molecule-generation).

**Model type**: An encoder-decoder-based GNN for molecular generation.

**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: Trained by the original authors with the default parameters provided [on GitHub](https://github.com/microsoft/molecule-generation).

**Paper or other resource for more information**: [Learning to Extend Molecular Scaffolds with Structural Motifs (ICLR 2022)](https://openreview.net/forum?id=ZTsoE8G3GG).

**License**: MIT

**Where to send questions or comments about the model**: Open an issue on original author's [GitHub repository](https://github.com/microsoft/molecule-generation).

**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery.

**Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.

**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.

**Factors**: Not applicable.

**Metrics**: Validation loss on decoding correct molecules. Evaluated on several downstream tasks.

**Datasets**: 1.5M drug-like molecules from GuacaMol benchmark. Finetuning on 20 molecular optimization tasks from GuacaMol.

**Ethical Considerations**: Unclear, please consult with original authors in case of questions.

**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

## Citation

```bib
@inproceedings{maziarz2021learning,
  author={Krzysztof Maziarz and Henry Richard Jackson{-}Flux and Pashmina Cameron and
    Finton Sirockin and Nadine Schneider and Nikolaus Stiefl and Marwin H. S. Segler and Marc Brockschmidt},
  title     = {Learning to Extend Molecular Scaffolds with Structural Motifs},
  booktitle = {The Tenth International Conference on Learning Representations, {ICLR}},
  year      = {2022}
}
```