File size: 3,952 Bytes
94127ad
4c8d3fa
0378df9
 
 
 
81aa7e8
603c70e
 
 
0378df9
 
603c70e
 
45dbc83
4c8d3fa
 
603c70e
 
 
 
 
 
 
 
 
 
 
 
 
 
2b65543
603c70e
 
 
 
 
be4649e
603c70e
 
2c72cd8
603c70e
 
45dbc83
603c70e
c62afba
 
 
 
 
 
603c70e
2c72cd8
 
 
 
603c70e
45dbc83
2c72cd8
603c70e
 
45dbc83
603c70e
 
 
 
 
 
 
 
 
 
2c72cd8
603c70e
 
2c72cd8
 
603c70e
 
 
 
 
c62afba
 
603c70e
 
 
 
 
 
 
 
 
 
 
 
 
 
2c72cd8
603c70e
2b65543
 
 
 
 
 
 
 
603c70e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
tags:
- biology
- small-molecules
- single-cell-genes
- drug-discovery
- drug-target-interaction
- ibm
- mammal
- pytorch

library_name: biomed-multi-alignment
license: apache-2.0
base_model:
- ibm/biomed.omics.bl.sm.ma-ted-458m
---

Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery.  
This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task.  
Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the strength of the interaction between a small molecule (drug) and a protein (target).  
The expected inputs for the model are the amino acid sequence of the target and the SMILES representation of the drug.  

The benchmark used for fine-tuning defined on: `https://tdcommons.ai/multi_pred_tasks/dti/`  
We also harmonize the values using data.harmonize_affinities(mode = 'max_affinity') and transforming to log-scale.  
By default, we are using Drug+Target cold-split, as provided by tdcommons.


## Model Summary

- **Developers:** IBM Research
- **GitHub Repository:** https://github.com/BiomedSciAI/biomed-multi-alignment
- **Paper:** https://arxiv.org/abs/2410.22367
- **Release Date**: Oct 28th, 2024
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

## Usage

Using `ibm/biomed.omics.bl.sm.ma-ted-458m` requires installing https://github.com/BiomedSciAI/biomed-multi-alignment

```
pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git#egg=mammal[examples]
```

A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m`:
```python
import os
from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp

from mammal.examples.dti_bindingdb_kd.task import DtiBindingdbKdTask
from mammal.keys import CLS_PRED, SCORES
from mammal.model import Mammal

# input
target_seq = "NLMKRCTRGFRKLGKCTTLEEEKCKTLYPRGQCTCSDSKMNTHSCDCKSC"
drug_seq = "CC(=O)NCCC1=CNc2c1cc(OC)cc2"

# Load Model
model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd")
model.eval()

# Load Tokenizer
tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd")

# convert to MAMMAL style
sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq}
sample_dict = DtiBindingdbKdTask.data_preprocessing(
    sample_dict=sample_dict,
    tokenizer_op=tokenizer_op,
    target_sequence_key="target_seq",
    drug_sequence_key="drug_seq",
    norm_y_mean=None,
    norm_y_std=None,
    device=model.device,
)

# forward pass - encoder_only mode which supports scalar predictions
batch_dict = model.forward_encoder_only([sample_dict])

# Post-process the model's output
batch_dict = DtiBindingdbKdTask.process_model_output(
    batch_dict,
    scalars_preds_processed_key="model.out.dti_bindingdb_kd",
    norm_y_mean=5.79384684128215,
    norm_y_std=1.33808027428196,
)
ans = {
    "model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0])
}

# Print prediction
print(f"{ans=}")
```

For more advanced usage, see our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment` 


## Citation

If you found our work useful, please consider giving a star to the repo and cite our paper:
```
@misc{shoshan2024mammalmolecularaligned,
      title={MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language}, 
      author={Yoel Shoshan and Moshiko Raboh and Michal Ozery-Flato and Vadim Ratner and Alex Golts and Jeffrey K. Weber and Ella Barkan and Simona Rabinovici-Cohen and Sagi Polaczek and Ido Amos and Ben Shapira and Liam Hazan and Matan Ninio and Sivan Ravid and Michael M. Danziger and Joseph A. Morrone and Parthasarathy Suryanarayanan and Michal Rosen-Zvi and Efrat Hexter},
      year={2024},
      eprint={2410.22367},
      archivePrefix={arXiv},
      primaryClass={q-bio.QM},
      url={https://arxiv.org/abs/2410.22367}, 
}
```