---
license: mit
library_name: pytorch
tags:
- biology
- microscopy
- text-to-image
- transformers
metrics:
- accuracy
---

# CELL-E 2

## Model description

CELL-E 2 is the second iteration of the original [CELL-E](https://www.biorxiv.org/content/10.1101/2022.05.27.493774v1) model which utilizes an amino acid sequence and nucleus image to make predictions of subcellular protein localization with respect to the nucleus.

CELL-E 2 is novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and *vice versa*). 
CELL-E 2 not only captures the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling *de novo* protein design. 
We trained on the [Human Protein Atlas](https://www.proteinatlas.org) and the [OpenCell](https://opencell.czbiohub.org) datasets.

CELL-E 2 utilizes pretrained amino acid embeddings from [ESM-2](https://github.com/facebookresearch/esm).


## Model variations

We have made several versions of CELL-E 2 available. The naming scheme follows the structure ```training set_hidden size``` where the hidden size is set to the embedding dimension of the pretrained ESM-2 model.

### HPA Models:

| Model | #params | Language |
|------------------------|--------------------------------|-------|
| [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M   | English |
| [`bert-large-uncased`](https://huggingface.co/bert-large-uncased)              | 340M    | English | sub 
| [`bert-base-cased`](https://huggingface.co/bert-base-cased)        | 110M    | English |
| [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M    |  English |

## Intended uses & limitations

You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
fine-tuned versions of a task that interests you.

Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
generation you should look at model like GPT2.

### How to use

```

Here is how to use this model to get the features of a given text in PyTorch:

```python
configs = OmegaConf.load(configs/config.yaml);
model = instantiate_from_config(configs.model).to(device);
model.sample(text=sequence, condition=nucleus)
```


### BibTeX entry and citation info

```bibtex
@article{,
  author    = {Emaad Khwaja and
                Yun S Song and
                Aaron Agarunov and
                Bo Huang},
  title     = {{CELL-E 2:} Translating Proteins to Pictures and Back with a Bidirectional Text-to-Image Transforme},
}
```