--- license: mit library_name: pytorch tags: - biology - microscopy - text-to-image - transformers metrics: - accuracy --- # CELL-E 2 ## Model description CELL-E 2 is the second iteration of the original [CELL-E](https://www.biorxiv.org/content/10.1101/2022.05.27.493774v1) model which utilizes an amino acid sequence and nucleus image to make predictions of subcellular protein localization with respect to the nucleus. CELL-E 2 is novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and *vice versa*). CELL-E 2 not only captures the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling *de novo* protein design. We trained on the [Human Protein Atlas](https://www.proteinatlas.org) and the [OpenCell](https://opencell.czbiohub.org) datasets. CELL-E 2 utilizes pretrained amino acid embeddings from [ESM-2](https://github.com/facebookresearch/esm). ## Model variations We have made several versions of CELL-E 2 available. The naming scheme follows the structure ```training set_hidden size``` where the hidden size is set to the embedding dimension of the pretrained ESM-2 model. ### HPA Models: | Model | #params | Language | |------------------------|--------------------------------|-------| | [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M | English | | [`bert-large-uncased`](https://huggingface.co/bert-large-uncased) | 340M | English | sub | [`bert-base-cased`](https://huggingface.co/bert-base-cased) | 110M | English | | [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M | English | ## Intended uses & limitations You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for fine-tuned versions of a task that interests you. Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2. ### How to use ``` Here is how to use this model to get the features of a given text in PyTorch: ```python configs = OmegaConf.load(configs/config.yaml); model = instantiate_from_config(configs.model).to(device); model.sample(text=sequence, condition=nucleus) ``` ### BibTeX entry and citation info ```bibtex @article{, author = {Emaad Khwaja and Yun S Song and Aaron Agarunov and Bo Huang}, title = {{CELL-E 2:} Translating Proteins to Pictures and Back with a Bidirectional Text-to-Image Transforme}, } ```