--- tags: - protein - small-molecule - dti - ibm - mammal - pytorch - transformers library_name: biomed license: apache-2.0 base_model: - ibm/biomed.omics.bl.sm.ma-ted-400m --- Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery. This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task. Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the strength of the interaction between a small molecule (drug) and a protein (target). The expected inputs for the model are the amino acid sequence of the target and the SMILES representation of the drug. The benchmark used for fine-tuning defined on: `https://tdcommons.ai/multi_pred_tasks/dti/` We also harmonize the values using data.harmonize_affinities(mode = 'max_affinity') and transforming to log-scale. By default, we are using Drug+Target cold-split, as provided by tdcommons. ## Model Summary - **Developers:** IBM Research - **GitHub Repository:** https://github.com/BiomedSciAI/biomed-multi-alignment - **Paper:** TBD - **Release Date**: Oct 28th, 2024 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). ## Usage Using `ibm/biomed.omics.bl.sm.ma-ted-400m` requires installing [https://github.com/BiomedSciAI/biomed-multi-alignment](https://github.com/TBD) ``` pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git#egg=mammal[examples] ``` A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-400m`: ```python import os from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp from mammal.examples.dti_bindingdb_kd.task import DtiBindingdbKdTask from mammal.keys import CLS_PRED, SCORES from mammal.model import Mammal # input target_seq = "NLMKRCTRGFRKLGKCTTLEEEKCKTLYPRGQCTCSDSKMNTHSCDCKSC" drug_seq = "CC(=O)NCCC1=CNc2c1cc(OC)cc2" # Load Model model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd") model.eval() # Load Tokenizer tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd") # convert to MAMMAL style sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq} sample_dict = DtiBindingdbKdTask.data_preprocessing( sample_dict=sample_dict, tokenizer_op=tokenizer_op, target_sequence_key="target_seq", drug_sequence_key="drug_seq", norm_y_mean=None, norm_y_std=None, device=model.device, ) # forward pass - encoder_only mode which supports scalar predictions batch_dict = model.forward_encoder_only([sample_dict]) # Post-process the model's output batch_dict = DtiBindingdbKdTask.process_model_output( batch_dict, scalars_preds_processed_key="model.out.dti_bindingdb_kd", norm_y_mean=5.79384684128215, norm_y_std=1.33808027428196, ) ans = { "model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0]) } # Print prediction print(f"{ans=}") ``` For more advanced usage, see our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment` ## Citation If you found our work useful, please consider giving a star to the repo and cite our paper: ``` @article{TBD, title={TBD}, author={IBM Research Team}, jounal={arXiv preprint arXiv:TBD}, year={2024} } ```