Emaad commited on
Commit
a77da45
1 Parent(s): 801881e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -1
README.md CHANGED
@@ -8,4 +8,65 @@ tags:
8
  - transformers
9
  metrics:
10
  - accuracy
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - transformers
9
  metrics:
10
  - accuracy
11
+ ---
12
+
13
+ # CELL-E 2
14
+
15
+ ## Model description
16
+
17
+ CELL-E 2 is the second iteration of the original [CELL-E](https://www.biorxiv.org/content/10.1101/2022.05.27.493774v1) model which utilizes an amino acid sequence and nucleus image to make predictions of subcellular protein localization with respect to the nucleus.
18
+
19
+ CELL-E 2 is novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and *vice versa*).
20
+ CELL-E 2 not only captures the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling *de novo* protein design.
21
+ We trained on the [Human Protein Atlas](https://www.proteinatlas.org) and the [OpenCell](https://opencell.czbiohub.org) datasets.
22
+
23
+ CELL-E 2 utilizes pretrained amino acid embeddings from [ESM-2](https://github.com/facebookresearch/esm).
24
+
25
+
26
+ ## Model variations
27
+
28
+ We have made several versions of CELL-E 2 available. The naming scheme follows the structure ```training set_hidden size``` where the hidden size is set to the embedding dimension of the pretrained ESM-2 model.
29
+
30
+ ### HPA Models:
31
+
32
+ | Model | #params | Language |
33
+ |------------------------|--------------------------------|-------|
34
+ | [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M | English |
35
+ | [`bert-large-uncased`](https://huggingface.co/bert-large-uncased) | 340M | English | sub
36
+ | [`bert-base-cased`](https://huggingface.co/bert-base-cased) | 110M | English |
37
+ | [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M | English |
38
+
39
+ ## Intended uses & limitations
40
+
41
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
42
+ be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
43
+ fine-tuned versions of a task that interests you.
44
+
45
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
46
+ to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
47
+ generation you should look at model like GPT2.
48
+
49
+ ### How to use
50
+
51
+ ```
52
+
53
+ Here is how to use this model to get the features of a given text in PyTorch:
54
+
55
+ ```python
56
+ configs = OmegaConf.load(configs/config.yaml);
57
+ model = instantiate_from_config(configs.model).to(device);
58
+ model.sample(text=sequence, condition=nucleus)
59
+ ```
60
+
61
+
62
+ ### BibTeX entry and citation info
63
+
64
+ ```bibtex
65
+ @article{,
66
+ author = {Emaad Khwaja and
67
+ Yun S Song and
68
+ Aaron Agarunov and
69
+ Bo Huang},
70
+ title = {{CELL-E 2:} Translating Proteins to Pictures and Back with a Bidirectional Text-to-Image Transforme},
71
+ }
72
+ ```