hdallatorre commited on
Commit
04a4424
1 Parent(s): dd7fd5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -13,7 +13,7 @@ datasets:
13
 
14
  The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. Through robust and extensive evaluation, we show that these large models provide extremely accurate molecular phenotype prediction compared to existing methods
15
 
16
- Part of this collection is the **nucleotide-transformer-2.5b-multi-species**, a 2.5B parameters transformer pre-trained on a collection of 850 genomes from a wide range of species, including model and non-model organisms.
17
 
18
  **Developed by:** InstaDeep, NVIDIA and TUM
19
 
@@ -27,6 +27,12 @@ Part of this collection is the **nucleotide-transformer-2.5b-multi-species**, a
27
  ### How to use
28
 
29
  <!-- Need to adapt this section to our model. Need to figure out how to load the models from huggingface and do inference on them -->
 
 
 
 
 
 
30
  ```python
31
  from transformers import AutoTokenizer, AutoModelForMaskedLM
32
  import torch
 
13
 
14
  The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. Through robust and extensive evaluation, we show that these large models provide extremely accurate molecular phenotype prediction compared to existing methods
15
 
16
+ Part of this collection is the **nucleotide-transformer-2.5b-multi-species**, a 2.5B parameters transformer pre-trained on a collection of 850 genomes from a wide range of species, including model and non-model organisms. The model is made available both in Tensorflow and Pytorch.
17
 
18
  **Developed by:** InstaDeep, NVIDIA and TUM
19
 
 
27
  ### How to use
28
 
29
  <!-- Need to adapt this section to our model. Need to figure out how to load the models from huggingface and do inference on them -->
30
+ Until its next release, the `transformers` library needs to be installed from source with the following command in order to use the models:
31
+ ```bash
32
+ pip install --upgrade git+https://github.com/huggingface/transformers.git
33
+ ```
34
+
35
+ A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
36
  ```python
37
  from transformers import AutoTokenizer, AutoModelForMaskedLM
38
  import torch