Add section which got accidentally removed (#3)
Browse files- Add section which got accidentally removed (6374d40585819de3294190e6da9e4c381702a87c)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -393,4 +393,48 @@ Reproduction script: https://huggingface.co/infgrad/dewey_en_beta/blob/main/scri
|
|
393 |
| [voyage-3](https://blog.voyageai.com/2024/09/18/voyage-3/) | 100% | Unknown | 1024 | 32000 | 74.06 | 74.06 | 74.06 |
|
394 |
| [inf-retriever-v1](https://huggingface.co/infly/inf-retriever-v1) | 100% | 7B | 3584 | 32768 | 73.19 | 73.19 | 73.19 |
|
395 |
|
396 |
-
### 3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
393 |
| [voyage-3](https://blog.voyageai.com/2024/09/18/voyage-3/) | 100% | Unknown | 1024 | 32000 | 74.06 | 74.06 | 74.06 |
|
394 |
| [inf-retriever-v1](https://huggingface.co/infly/inf-retriever-v1) | 100% | 7B | 3584 | 32768 | 73.19 | 73.19 | 73.19 |
|
395 |
|
396 |
+
### 3.3 LoCoV1
|
397 |
+
|
398 |
+
URL: https://huggingface.co/datasets/hazyresearch/LoCoV1-Queries\
|
399 |
+
https://huggingface.co/datasets/hazyresearch/LoCoV1-Documents
|
400 |
+
|
401 |
+
Reproduction script: https://huggingface.co/infgrad/dewey_en_beta/blob/main/scripts/evaluate/run_evaluate_loco.py
|
402 |
+
|
403 |
+
Metric: NDCG@10
|
404 |
+
|
405 |
+
Result:
|
406 |
+
|
407 |
+
| **dataset-name** | **bge-m3-8k** | **gte-modernbert-base-8k** | **Linq-Embed-Mistral-4k** | **Linq-Embed-Mistral-8k** | **SFR-Embedding-Mistral-8k** | **e5-mistral-7b-instruct-8k** | **dewey_en_beta-8k** | **dewey_en_beta_64k** | **dewey_en_beta_64k-multi-vectors** |
|
408 |
+
|:---------------------------------:|:-------------:|:--------------------------:|:-------------------------:|:-------------------------:|:----------------------------:|:-----------------------------:|:--------------------:|:------------------------:|:--------------------------------------:|
|
409 |
+
| **2wikimqa_test** | 0.9271 | 0.8658 | 0.8884 | 0.9067 | 0.8965 | 0.8901 | 0.8953 | 0.9051 | 0.9775 |
|
410 |
+
| **courtlistener_HTML_test** | 0.1933 | 0.2349 | 0.3551 | 0.3670 | 0.3647 | 0.3543 | 0.3415 | 0.3616 | 0.4775 |
|
411 |
+
| **courtlistener_Plain_Text_test** | 0.1888 | 0.2478 | 0.3675 | 0.3761 | 0.3679 | 0.3579 | 0.3377 | 0.3485 | 0.4426 |
|
412 |
+
| **gov_report_test** | 0.9869 | 0.9750 | 0.9832 | 0.9837 | 0.9816 | 0.9823 | 0.9855 | 0.9883 | 0.9853 |
|
413 |
+
| **legal_case_reports_test** | 0.3702 | 0.4476 | 0.5398 | 0.5432 | 0.5319 | 0.4850 | 0.5474 | 0.5875 | 0.6534 |
|
414 |
+
| **multifieldqa_test** | 0.9373 | 0.9341 | 0.9345 | 0.9327 | 0.9450 | 0.9321 | 0.9687 | 0.9564 | 0.9754 |
|
415 |
+
| **passage_retrieval_test** | 0.4493 | 0.5271 | 0.3470 | 0.3407 | 0.2902 | 0.3248 | 0.7562 | 0.7389 | 0.8550 |
|
416 |
+
| **qasper_abstract_test** | 1.0000 | 0.9806 | 0.9982 | 0.9982 | 0.9973 | 0.9965 | 0.9973 | 0.9982 | 0.9982 |
|
417 |
+
| **qasper_title_test** | 0.9860 | 0.8892 | 0.9838 | 0.9833 | 0.9861 | 0.9812 | 0.9742 | 0.9742 | 0.9840 |
|
418 |
+
| **qmsum_test** | 0.6668 | 0.6307 | 0.6816 | 0.7237 | 0.7169 | 0.7148 | 0.7438 | 0.7613 | 0.8154 |
|
419 |
+
| **stackoverflow_test** | 0.9634 | 0.9087 | 0.9760 | 0.9760 | 0.9766 | 0.9690 | 0.9362 | 0.9369 | 0.9443 |
|
420 |
+
| **summ_screen_fd_test** | 0.9320 | 0.9379 | 0.9747 | 0.9635 | 0.9656 | 0.9580 | 0.9796 | 0.9821 | 0.9788 |
|
421 |
+
| **Average** | 0.7168 | 0.7150 | 0.7525 | 0.7579 | 0.7517 | 0.7455 | 0.7886 |**0.7949** |**0.8406** |
|
422 |
+
|
423 |
+
## 4 Limitations
|
424 |
+
|
425 |
+
- Only English text.
|
426 |
+
- On short text tasks, the performance might not be as good as that of conventional short text embedding models.
|
427 |
+
- As said before, this model is still in alpha or beta stage, the model may have some unexpected behaviour.
|
428 |
+
|
429 |
+
## 5 Cite
|
430 |
+
|
431 |
+
```
|
432 |
+
@misc{zhang2025deweylongcontextembedding,
|
433 |
+
title={Dewey Long Context Embedding Model: A Technical Report},
|
434 |
+
author={Dun Zhang and Panxiang Zou and Yudong Zhou},
|
435 |
+
year={2025},
|
436 |
+
eprint={2503.20376},
|
437 |
+
archivePrefix={arXiv},
|
438 |
+
primaryClass={cs.IR},
|
439 |
+
url={https://arxiv.org/abs/2503.20376},
|
440 |
+
}
|