HuangLab
/

CELL-E_2_HPA_2560

Model card Files Files and versions Community

Emaad commited on May 16, 2023

Commit

a77da45

•

1 Parent(s): 801881e

Update README.md

Files changed (1) hide show

README.md +62 -1

README.md CHANGED Viewed

@@ -8,4 +8,65 @@ tags:
 - transformers
 metrics:
 - accuracy
----

 - transformers
 metrics:
 - accuracy
+---
+# CELL-E 2
+## Model description
+CELL-E 2 is the second iteration of the original [CELL-E](https://www.biorxiv.org/content/10.1101/2022.05.27.493774v1) model which utilizes an amino acid sequence and nucleus image to make predictions of subcellular protein localization with respect to the nucleus.
+CELL-E 2 is novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and *vice versa*).
+CELL-E 2 not only captures the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling *de novo* protein design.
+We trained on the [Human Protein Atlas](https://www.proteinatlas.org) and the [OpenCell](https://opencell.czbiohub.org) datasets.
+CELL-E 2 utilizes pretrained amino acid embeddings from [ESM-2](https://github.com/facebookresearch/esm).
+## Model variations
+We have made several versions of CELL-E 2 available. The naming scheme follows the structure ```training set_hidden size``` where the hidden size is set to the embedding dimension of the pretrained ESM-2 model.
+### HPA Models:
+| Model | #params | Language |
+|------------------------|--------------------------------|-------|
+| [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M | English |
+| [`bert-large-uncased`](https://huggingface.co/bert-large-uncased) | 340M | English | sub
+| [`bert-base-cased`](https://huggingface.co/bert-base-cased) | 110M | English |
+| [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M | English |
+## Intended uses & limitations
+You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
+be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
+fine-tuned versions of a task that interests you.
+Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
+to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
+generation you should look at model like GPT2.
+### How to use
+```
+Here is how to use this model to get the features of a given text in PyTorch:
+```python
+configs = OmegaConf.load(configs/config.yaml);
+model = instantiate_from_config(configs.model).to(device);
+model.sample(text=sequence, condition=nucleus)
+```
+### BibTeX entry and citation info
+```bibtex
+@article{,
+ author = {Emaad Khwaja and
+ Yun S Song and
+ Aaron Agarunov and
+ Bo Huang},
+ title = {{CELL-E 2:} Translating Proteins to Pictures and Back with a Bidirectional Text-to-Image Transforme},
+}
+```