sigridjineth
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -36,6 +36,37 @@ These combined resources ensure coverage across a wide range of topics, styles,
|
|
36 |
- **Need for Evaluation**:
|
37 |
Developing and standardizing benchmarks for generalized Korean retrieval tasks (especially for rerankers) will be an ongoing effort.
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
## Usage (transformers>=4.36.0)
|
40 |
|
41 |
```python
|
|
|
36 |
- **Need for Evaluation**:
|
37 |
Developing and standardizing benchmarks for generalized Korean retrieval tasks (especially for rerankers) will be an ongoing effort.
|
38 |
|
39 |
+
## Evaluation
|
40 |
+
The [AutoRAG Benchmark](https://github.com/Marker-Inc-Korea/AutoRAG-example-korean-embedding-benchmark) serves as both the evaluation dataset and the toolkit for reporting these metrics.
|
41 |
+
|
42 |
+
### Model: `sigridjineth/ko-reranker-v1.1-preview`
|
43 |
+
|
44 |
+
| top_k | Execution Time | F1 | Recall | Precision | MAP | MRR | NDCG | Is Best |
|
45 |
+
|-------|----------------|--------|--------|-----------|--------|--------|--------|---------|
|
46 |
+
| 1 | 0.0438 | 0.6754 | 0.6754 | 0.6754 | 0.6754 | 0.6754 | 0.6754 | True |
|
47 |
+
| 3 | 0.0486 | 0.3684 | 0.7368 | 0.2456 | 0.7032 | 0.7032 | 0.7119 | False |
|
48 |
+
| 5 | 0.0446 | 0.3684 | 0.7368 | 0.2456 | 0.7032 | 0.7032 | 0.7119 | False |
|
49 |
+
|
50 |
+
---
|
51 |
+
|
52 |
+
### Model: `Alibaba-NLP/gte-multilingual-reranker-base`
|
53 |
+
|
54 |
+
| top_k | Execution Time | F1 | Recall | Precision | MAP | MRR | NDCG | Is Best |
|
55 |
+
|-------|----------------|--------|--------|-----------|--------|--------|--------|---------|
|
56 |
+
| 1 | 0.0481 | 0.6316 | 0.6316 | 0.6316 | 0.6316 | 0.6316 | 0.6316 | True |
|
57 |
+
| 3 | 0.0427 | 0.3596 | 0.7193 | 0.2398 | 0.6725 | 0.6725 | 0.6846 | False |
|
58 |
+
| 5 | 0.0442 | 0.3596 | 0.7193 | 0.2398 | 0.6725 | 0.6725 | 0.6846 | False |
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
### Model: `dragonkue/bge-reranker-v2-m3-ko`
|
63 |
+
|
64 |
+
| top_k | Execution Time | F1 | Recall | Precision | MAP | MRR | NDCG | Is Best |
|
65 |
+
|-------|----------------|--------|--------|-----------|--------|--------|--------|---------|
|
66 |
+
| 1 | 0.0814 | 0.6930 | 0.6930 | 0.6930 | 0.6930 | 0.6930 | 0.6930 | True |
|
67 |
+
| 3 | 0.0813 | 0.3596 | 0.7193 | 0.2398 | 0.7061 | 0.7061 | 0.7096 | False |
|
68 |
+
| 5 | 0.0824 | 0.3596 | 0.7193 | 0.2398 | 0.7061 | 0.7061 | 0.7096 | False |
|
69 |
+
|
70 |
## Usage (transformers>=4.36.0)
|
71 |
|
72 |
```python
|