File size: 1,550 Bytes
0be77bb
 
 
 
 
 
 
 
 
 
 
625c5df
472b2b9
0be77bb
625c5df
0be77bb
472b2b9
 
 
 
 
 
 
 
 
0be77bb
 
affedd7
0be77bb
6f6f99c
0be77bb
affedd7
 
0be77bb
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Basic Information

This is the Dr. Decr model used in XOR-TyDi leaderboard task 1 whitebox submission. 

https://nlp.cs.washington.edu/xorqa/


The detailed implementation of the model can be found in:

https://arxiv.org/pdf/2112.08185.pdf

Source code to train the model can be found via PrimeQA's IR component:
https://github.com/primeqa/primeqa/tree/main/examples/drdecr

It is a Neural IR model built on top of the ColBERTv1 api and not directly compatible with Huggingface API. The inference result on XOR Dev dataset is:
```
	R@2kt	R@5kt
te	66.67	70.88
bn	70.23	75.08
fi	82.24	86.18
ja	65.92	72.93
ko	67.93	71.73
ru	63.07	69.71
ar	78.15	82.77
Avg	70.60	75.61
```

# Limitations and Bias

This model used pre-trained XLM-R base model and fine tuned on 7 languages in XOR-TyDi leaderboard. The performance of other languages was not tested. 

Since the model was fine-tuned on a large pre-trained language model XLM-Roberta, biases associated with the pre-existing XLM-Roberta model may be present in our fine-tuned model, Dr. Decr
 
# Citation
```
@article{Li2021_DrDecr,
  doi = {10.48550/ARXIV.2112.08185},
  url = {https://arxiv.org/abs/2112.08185},
  author = {Li, Yulong and Franz, Martin and Sultan, Md Arafat and Iyer, Bhavani and Lee, Young-Suk and Sil, Avirup},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Learning Cross-Lingual IR from an English Retriever},
  publisher = {arXiv},
  year = {2021}
}
```