akariasai commited on
Commit
3e42dfc
·
1 Parent(s): d2678e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -8
README.md CHANGED
@@ -1,14 +1,25 @@
1
- ## facebook/tart-full-flan-t5-xl
2
 
3
- `facebook/tart-full-flan-t5-xl` is a multi-task cross-encoder model trained via instruction-tuning on approximately 40 retrieval tasks, initialized with [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl).
 
 
 
 
 
 
 
 
 
4
 
5
  ### Installation
6
- ```
7
  git clone https://github.com/facebookresearch/tart
8
  pip install -r requirements.txt
9
  cd tart/TART
10
  ```
11
 
 
 
12
  TART-full can be loaded through our customized EncT5 model.
13
  ```python
14
  from src.modeling_enc_t5 import EncT5ForSequenceClassification
@@ -17,14 +28,14 @@ import torch
17
  import torch.nn.functional as F
18
 
19
  # load TART full and tokenizer
20
- model = EncT5ForSequenceClassification.from_pretrained("tart_full_flan_t5_xl")
21
- tokenizer = EncT5Tokenizer.from_pretrained("tart_full_flan_t5_xl")
22
  model.eval()
23
 
24
  q = "What is the population of Tokyo?"
25
  in_answer = "retrieve a passage that answers this question from Wikipedia"
26
 
27
- p_1 = "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million at the start of 2022."
28
  p_2 = "Tokyo, officially the Tokyo Metropolis (東京都, Tōkyō-to), is the capital and largest city of Japan."
29
 
30
  # 1. TART-full can identify more relevant paragraph.
@@ -32,7 +43,8 @@ features = tokenizer(['{0} [SEP] {1}'.format(in_answer, q), '{0} [SEP] {1}'.form
32
  with torch.no_grad():
33
  scores = model(**features).logits
34
  normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
35
- print([p_1, p_2]np.argmax(normalized_scores)) # "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million."
 
36
 
37
  # 2. TART-full can identify the document that is more relevant AND follows instructions.
38
  in_sim = "You need to find duplicated questions in Wiki forum. Could you find a question that is similar to this question"
@@ -42,5 +54,5 @@ with torch.no_grad():
42
  scores = model(**features).logits
43
  normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
44
 
45
- print([p, q_1]np.argmax(normalized_scores)) # "How many people live in Tokyo?"
46
  ```
 
1
+ # Task-aware Retrieval with Instructions
2
 
3
+ Official repository: [github.com/facebookresearch/tart](https://github.com/facebookresearch/tart)
4
+
5
+ ### Model descriptions
6
+
7
+ `facebook/tart-full-flan-t5-xl` is a multi-task cross-encoder model trained via instruction-tuning on approximately 40 retrieval tasks, which is initialized with [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl).
8
+
9
+ TART-full is a 1.5 billion cross-necoder and it can rerank top documents given a query and natural language instruction (e.g., *find a Wikipedia paragraph that answers this question.*).
10
+ Experimental results on widely-used [BEIR](https://github.com/beir-cellar/beir), [LOTTE](https://huggingface.co/datasets/colbertv2/lotte), and our new evaluation, [X^2-Retrieval](https://github.com/facebookresearch/tart/cross_task_cross_eval) show that TART-full outperforms previous state-of-the-art methods by levaraging natural language instructions.
11
+
12
+ More details about modeling and training are in our paper: [Task-aware Retrieval with Instructions](https://arxiv.org/abs/2211.09260).
13
 
14
  ### Installation
15
+ ```sh
16
  git clone https://github.com/facebookresearch/tart
17
  pip install -r requirements.txt
18
  cd tart/TART
19
  ```
20
 
21
+ ### How to use?
22
+
23
  TART-full can be loaded through our customized EncT5 model.
24
  ```python
25
  from src.modeling_enc_t5 import EncT5ForSequenceClassification
 
28
  import torch.nn.functional as F
29
 
30
  # load TART full and tokenizer
31
+ model = EncT5ForSequenceClassification.from_pretrained("facebook/tart-full-flan-t5-xl")
32
+ tokenizer = EncT5Tokenizer.from_pretrained("facebook/tart-full-flan-t5-xl")
33
  model.eval()
34
 
35
  q = "What is the population of Tokyo?"
36
  in_answer = "retrieve a passage that answers this question from Wikipedia"
37
 
38
+ p_1 = "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million at the start of 2022, the first decline since 1996, the metropolitan government reported Monday."
39
  p_2 = "Tokyo, officially the Tokyo Metropolis (東京都, Tōkyō-to), is the capital and largest city of Japan."
40
 
41
  # 1. TART-full can identify more relevant paragraph.
 
43
  with torch.no_grad():
44
  scores = model(**features).logits
45
  normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
46
+
47
+ print([p_1, p_2][np.argmax(normalized_scores)]) # "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million ... "
48
 
49
  # 2. TART-full can identify the document that is more relevant AND follows instructions.
50
  in_sim = "You need to find duplicated questions in Wiki forum. Could you find a question that is similar to this question"
 
54
  scores = model(**features).logits
55
  normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
56
 
57
+ print([p, q_1][np.argmax(normalized_scores)]) # "How many people live in Tokyo?"
58
  ```