moshe-raboh's picture
Update README.md
c79160b verified
|
raw
history blame
3.35 kB
metadata
tags:
  - protein
  - small-molecule
  - dti
  - ibm
  - mammal
  - pytorch
  - transformers
library_name: biomed
license: apache-2.0
base_model:
  - ibm/biomed.omics.bl.sm.ma-ted-400m

Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery.
This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task.
Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the strength of the interaction between a small molecule (drug) and a protein (target).
The expected inputs for the model are the amino acid sequence of the target and the SMILES representation of the drug.

The benchmark used for fine-tuning defined on: https://tdcommons.ai/multi_pred_tasks/dti/
We also harmonize the values using data.harmonize_affinities(mode = 'max_affinity') and transforming to log-scale.
By default, we are using Drug+Target cold-split, as provided by tdcommons.

Model Summary

Usage

Using ibm/biomed.omics.bl.sm.ma-ted-400m requires installing https://github.com/BiomedSciAI/biomed-multi-alignment

pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git#egg=mammal[examples]

A simple example for a task already supported by ibm/biomed.omics.bl.sm.ma-ted-400m:

import os
from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp

from mammal.examples.dti_bindingdb_kd.task import DtiBindingdbKdTask
from mammal.keys import CLS_PRED, SCORES
from mammal.model import Mammal

# input
target_seq = "NLMKRCTRGFRKLGKCTTLEEEKCKTLYPRGQCTCSDSKMNTHSCDCKSC"
drug_seq = "CC(=O)NCCC1=CNc2c1cc(OC)cc2"

# Load Model
model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd")
model.eval()

# Load Tokenizer
tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-400m.dti_bindingdb_pkd")

# convert to MAMMAL style
sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq}
sample_dict = DtiBindingdbKdTask.data_preprocessing(
    sample_dict=sample_dict,
    tokenizer_op=tokenizer_op,
    target_sequence_key="target_seq",
    drug_sequence_key="drug_seq",
    norm_y_mean=None,
    norm_y_std=None,
    device=model.device,
)

# forward pass - encoder_only mode which supports scalar predictions
batch_dict = model.forward_encoder_only([sample_dict])

# Post-process the model's output
batch_dict = DtiBindingdbKdTask.process_model_output(
    batch_dict,
    scalars_preds_processed_key="model.out.dti_bindingdb_kd",
    norm_y_mean=5.79384684128215,
    norm_y_std=1.33808027428196,
)
ans = {
    "model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0])
}

# Print prediction
print(f"{ans=}")

For more advanced usage, see our detailed example at: on https://github.com/BiomedSciAI/biomed-multi-alignment

Citation

If you found our work useful, please consider giving a star to the repo and cite our paper:

@article{TBD,
  title={TBD},
  author={IBM Research Team},
  jounal={arXiv preprint arXiv:TBD},
  year={2024}
}