Dan Fu commited on
Commit
785c4ec
·
1 Parent(s): 6471836
Files changed (1) hide show
  1. README.md +78 -10
README.md CHANGED
@@ -2,14 +2,15 @@
2
  license: apache-2.0
3
  language:
4
  - en
5
- pipeline_tag: sentence-similarity
6
  inference: false
7
  ---
8
 
9
  # Monarch Mixer-BERT
10
 
11
- The 80M checkpoint for M2-BERT-base from the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109).
12
- This model has been pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
 
13
 
14
  This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
15
 
@@ -19,21 +20,88 @@ Check out our [GitHub](https://github.com/HazyResearch/m2/tree/main) for instruc
19
 
20
  You can load this model using Hugging Face `AutoModel`:
21
  ```python
22
- from transformers import AutoModelForMaskedLM
23
- model = AutoModelForMaskedLM.from_pretrained("togethercomputer/m2-bert-80M-2k-retrieval", trust_remote_code=True)
 
 
 
24
  ```
25
 
 
 
 
26
  This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:
27
- ```
28
- from transformers import AutoTokenizer, AutoModelForMaskedLM
29
 
30
  max_seq_length = 2048
31
  testing_string = "Every morning, I make a cup of coffee to start my day."
32
- model = AutoModelForMaskedLM.from_pretrained("togethercomputer/m2-bert-80M-2k-retrieval", trust_remote_code=True)
 
 
 
33
 
34
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", model_max_length=max_seq_length)
35
- input_ids = tokenizer([testing_string], return_tensors="pt", padding="max_length", return_token_type_ids=False, truncation=True, max_length=max_seq_length)
 
 
 
 
 
 
 
 
 
 
36
 
37
  outputs = model(**input_ids)
38
  embeddings = outputs['sentence_embedding']
39
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ pipeline_tag: text-classification
6
  inference: false
7
  ---
8
 
9
  # Monarch Mixer-BERT
10
 
11
+ An 80M checkpoint of M2-BERT, pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
12
+
13
+ Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
14
 
15
  This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
16
 
 
20
 
21
  You can load this model using Hugging Face `AutoModel`:
22
  ```python
23
+ from transformers import AutoModelForSequenceClassification
24
+ model = AutoModelForSequenceClassification.from_pretrained(
25
+ "togethercomputer/m2-bert-80M-2k-retrieval",
26
+ trust_remote_code=True
27
+ )
28
  ```
29
 
30
+ You should expect to see a large error message about unused parameters for FlashFFTConv.
31
+ If you'd like to load the model with FlashFFTConv, you can check out our [GitHub](https://github.com/HazyResearch/m2/tree/main).
32
+
33
  This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
36
 
37
  max_seq_length = 2048
38
  testing_string = "Every morning, I make a cup of coffee to start my day."
39
+ model = AutoModelForSequenceClassification.from_pretrained(
40
+ "togethercomputer/m2-bert-80M-2k-retrieval",
41
+ trust_remote_code=True
42
+ )
43
 
44
+ tokenizer = AutoTokenizer.from_pretrained(
45
+ "bert-base-uncased",
46
+ model_max_length=max_seq_length
47
+ )
48
+ input_ids = tokenizer(
49
+ [testing_string],
50
+ return_tensors="pt",
51
+ padding="max_length",
52
+ return_token_type_ids=False,
53
+ truncation=True,
54
+ max_length=max_seq_length
55
+ )
56
 
57
  outputs = model(**input_ids)
58
  embeddings = outputs['sentence_embedding']
59
  ```
60
+
61
+ You can also get embeddings from this model using the Together API as follows (you can find your API key [here](https://api.together.xyz/settings/api-keys)):
62
+ ```python
63
+ import os
64
+ import requests
65
+
66
+ def generate_together_embeddings(text: str, model_api_string: str, api_key: str):
67
+ url = "https://api.together.xyz/api/v1/embeddings"
68
+ headers = {
69
+ "accept": "application/json",
70
+ "content-type": "application/json",
71
+ "Authorization": f"Bearer {api_key}"
72
+ }
73
+ session = requests.Session()
74
+ response = session.post(
75
+ url,
76
+ headers=headers,
77
+ json={
78
+ "input": text,
79
+ "model": model_api_string
80
+ }
81
+ )
82
+ if response.status_code != 200:
83
+ raise ValueError(f"Request failed with status code {response.status_code}: {response.text}")
84
+ return response.json()['data'][0]['embedding']
85
+
86
+ print(generate_together_embeddings(
87
+ 'Hello world',
88
+ 'togethercomputer/m2-bert-80M-2k-retrieval',
89
+ os.environ['TOGETHER_API_KEY'])[:10]
90
+ )
91
+ ```
92
+
93
+ ## Acknowledgments
94
+
95
+ Alycia Lee helped with AutoModel support.
96
+
97
+ ## Citation
98
+
99
+ If you use this model, or otherwise found our work valuable, you can cite us as follows:
100
+ ```
101
+ @inproceedings{fu2023monarch,
102
+ title={Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture},
103
+ author={Fu, Daniel Y and Arora, Simran and Grogan, Jessica and Johnson, Isys and Eyuboglu, Sabri and Thomas, Armin W and Spector, Benjamin and Poli, Michael and Rudra, Atri and R{\'e}, Christopher},
104
+ booktitle={Advances in Neural Information Processing Systems},
105
+ year={2023}
106
+ }
107
+ ```