dkarthikeyan1 commited on
Commit
7afa636
·
verified ·
1 Parent(s): 8da6194

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -11
README.md CHANGED
@@ -20,14 +20,14 @@ It is released along with [this paper](google.com).
20
 
21
  ## Intended uses & limitations
22
 
23
- This model is designed for auto-regressively generating CDR3$\beta$ sequences against a pMHC of interest.
24
  This means that the model assumes a plausible pMHC is provided as input. We have not tested the model on peptides and MHC sequences
25
  where the binding affinity between petpide-MHC is low and do not expect the model will adjust its predictions around this.
26
  This model is intended for academic purposes and should not be used in a clinical setting.
27
 
28
  ### How to use
29
 
30
- You can use this model directly for conditional CDR3$\b$ generation:
31
 
32
  ```python
33
  import re
@@ -61,7 +61,7 @@ cdr3b_sequences = [re.sub(r'\[.*\]', '', x) for x in tokenizer.batch_decode(mode
61
  'CASSLGTGGNQPQHF']
62
  ```
63
 
64
- This model can also be used for unconditional generation of CDR3$\beta$ sequences:
65
 
66
  ```python
67
  import re
@@ -115,8 +115,8 @@ corpus of ~330k TCR:peptide-pseudosequence pairs taken from [VDJdb](https://vdjd
115
 
116
  ### Preprocessing
117
 
118
- All amino acid sequences, and V/J gene names were standardized using the \texttt{`tidytcells'} package. See [here](https://pmc.ncbi.nlm.nih.gov/articles/PMC10634431/). MHC
119
- allele information was standardized using \texttt{`mhcgnomes'}, available [here](https://pypi.org/project/mhcgnomes/) before mapping allele information to the MHC pseudo-sequence
120
  as defined in [NetMHCpan](https://pmc.ncbi.nlm.nih.gov/articles/PMC3319061/).
121
 
122
  ### Pre-training
@@ -150,12 +150,8 @@ Masks 'mlm_probability' tokens grouped into spans of size 'max_span_length' acco
150
 
151
  ### Finetuning
152
 
153
- TCRT5 was finetuned on peptide-pseudo sequence -> CDR3$\beta$ source:target pairs using the canonical cross entropy loss:
154
 
155
- $$
156
- \mathcal{L} = CE(\textbf{y}, \hat{\textbf{y}}) & = - \sum_{i=1}^n \textbf{y}_i \log \hat{\textbf{y}}_i
157
- = - \sum_{i=1}^n \sum_{j-1}^k y_{ij} \log p_\theta (y_{ij} | \textbf{x})
158
- $$
159
 
160
  ```
161
  Example Input:
@@ -171,7 +167,7 @@ $$
171
 
172
  ## Results
173
 
174
- This fine-tuned model achieves the following results on conditional CDR3$\beta$ generation on our validation set of the top-20 peptide-MHCs with the most abundant known TCRs (in alphabetical order):
175
 
176
  1. AVFDRKSDAK_A*11:01
177
  2. CRVRLCCYVL_C*07:02
 
20
 
21
  ## Intended uses & limitations
22
 
23
+ This model is designed for auto-regressively generating CDR3 \\(\beta\\) sequences against a pMHC of interest.
24
  This means that the model assumes a plausible pMHC is provided as input. We have not tested the model on peptides and MHC sequences
25
  where the binding affinity between petpide-MHC is low and do not expect the model will adjust its predictions around this.
26
  This model is intended for academic purposes and should not be used in a clinical setting.
27
 
28
  ### How to use
29
 
30
+ You can use this model directly for conditional CDR3 \\(\beta\\) generation:
31
 
32
  ```python
33
  import re
 
61
  'CASSLGTGGNQPQHF']
62
  ```
63
 
64
+ This model can also be used for unconditional generation of CDR3 \\(\beta\\) sequences:
65
 
66
  ```python
67
  import re
 
115
 
116
  ### Preprocessing
117
 
118
+ All amino acid sequences, and V/J gene names were standardized using the `tidytcells` package. See [here](https://pmc.ncbi.nlm.nih.gov/articles/PMC10634431/). MHC
119
+ allele information was standardized using `mhcgnomes`, available [here](https://pypi.org/project/mhcgnomes/) before mapping allele information to the MHC pseudo-sequence
120
  as defined in [NetMHCpan](https://pmc.ncbi.nlm.nih.gov/articles/PMC3319061/).
121
 
122
  ### Pre-training
 
150
 
151
  ### Finetuning
152
 
153
+ TCRT5 was finetuned on peptide-pseudo sequence -> CDR3 \\(\beta\\) source:target pairs using the canonical cross entropy loss.
154
 
 
 
 
 
155
 
156
  ```
157
  Example Input:
 
167
 
168
  ## Results
169
 
170
+ This fine-tuned model achieves the following results on conditional CDR3 \\(\beta\\) generation on our validation set of the top-20 peptide-MHCs with the most abundant known TCRs (in alphabetical order):
171
 
172
  1. AVFDRKSDAK_A*11:01
173
  2. CRVRLCCYVL_C*07:02