yjoonjang commited on
Commit
b940fa1
β€’
1 Parent(s): 7656f9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -6
README.md CHANGED
@@ -13,21 +13,30 @@ pipeline_tag: sentence-similarity
13
 
14
  # KoE5
15
 
16
- **KoE5: ν•œκ΅­μ–΄ μž„λ² λ”© μ„±λŠ₯ ν–₯상을 μœ„ν•œ μƒˆλ‘œμš΄ 데이터셋 및 λͺ¨λΈ** - μž₯μ˜μ€€, μ†μ€€μ˜, λ°•μ°¬μ€€, 이병ꡬ, μ΄νƒœλ―Ό, μž„ν¬μ„, HCLT 2024 Oral accepted
 
 
17
 
18
- This model is fine-tuned model based on multilingual-e5-large with [ko-triplet-v1.0](nlpai-lab/ko-triplet-v1.0)
19
 
 
20
 
21
- ## Uses
22
 
23
- ### Direct Usage (Sentence Transformers)
 
 
 
 
24
 
 
 
25
  First install the Sentence Transformers library:
26
 
27
  ```bash
28
  pip install -U sentence-transformers
29
  ```
30
-
31
  Then you can load this model and run inference.
32
  ```python
33
  from sentence_transformers import SentenceTransformer
@@ -53,6 +62,33 @@ print(similarities)
53
  # [0.3897, 0.3740, 1.0000]])
54
  ```
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## FAQ
57
 
58
  **1. Do I need to add the prefix "query: " and "passage: " to input texts?**
@@ -69,7 +105,16 @@ Here are some rules of thumb:
69
  ## Citation
70
 
71
  If you find our paper or models helpful, please consider cite as follows:
72
-
 
 
 
 
 
 
 
 
 
73
  ```
74
  @article{wang2024multilingual,
75
  title={Multilingual E5 Text Embeddings: A Technical Report},
 
13
 
14
  # KoE5
15
 
16
+ Introducing KoE5, a model with advanced retrieval abilities.
17
+ It has shown remarkable performance in Korean text retrieval, speficially overwhelming most multilingual embedding models.
18
+ To our knowledge, It is one of the best publicly opened Korean retrieval models.
19
 
20
+ For details, visit the [KoE5 repository](https://github.com/nlpai-lab/KoE5)
21
 
22
+ ### Model Description
23
 
24
+ This is the model card of a πŸ€— transformers model that has been pushed on the Hub.
25
 
26
+ - **Developed by:** [NLP&AI Lab](http://nlp.korea.ac.kr/)
27
+ - **Language(s) (NLP):** Korean, English
28
+ - **License:** MIT
29
+ - **Finetuned from model:** [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large)
30
+ - **Finetuned dataset:** [ko-triplet-v1.0](nlpai-lab/ko-triplet-v1.0)
31
 
32
+ ## Example code
33
+ ### Install Dependencies
34
  First install the Sentence Transformers library:
35
 
36
  ```bash
37
  pip install -U sentence-transformers
38
  ```
39
+ ### Python code
40
  Then you can load this model and run inference.
41
  ```python
42
  from sentence_transformers import SentenceTransformer
 
62
  # [0.3897, 0.3740, 1.0000]])
63
  ```
64
 
65
+ ## Training Details
66
+
67
+ ### Training Data
68
+
69
+ - [ko-triplet-v1.0](nlpai-lab/ko-triplet-v1.0)
70
+ - Korean query-document-hard_negative data pair (open data)
71
+ - About 700000+ examples used totally
72
+
73
+ ### Training Procedure
74
+
75
+ - **loss:** Used **[CachedMultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss)** by sentence-transformers
76
+ - **batch size:** 512
77
+ - **learning rate:** 1e-05
78
+ - **epochs:** 1
79
+
80
+ ## Evaluation
81
+ ### Metrics
82
+ - NDCG@1, F1@1, NDCG@3, F1@3
83
+ ### Benchmark Datasets
84
+ - Ko-strategyQA
85
+ - AutoRAG-benchmark
86
+ - PublicHealthQA
87
+
88
+ ## Results
89
+
90
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a4c4ed2548c41ad9b1421c/8toWmSrqH-aLKq1rSiqnv.png)
91
+
92
  ## FAQ
93
 
94
  **1. Do I need to add the prefix "query: " and "passage: " to input texts?**
 
105
  ## Citation
106
 
107
  If you find our paper or models helpful, please consider cite as follows:
108
+ ```text
109
+ @misc{KoE5,
110
+ author = {NLP & AI Lab and Human-Inspired AI research},
111
+ title = {KoE5: A New Dataset and Model for Improving Korean Embedding Performance},
112
+ year = {2024},
113
+ publisher = {Youngjoon Jang, Junyoung Son, Taemin Lee},
114
+ journal = {GitHub repository},
115
+ howpublished = {\url{https://github.com/nlpai-lab/KoE5}},
116
+ }
117
+ ```
118
  ```
119
  @article{wang2024multilingual,
120
  title={Multilingual E5 Text Embeddings: A Technical Report},