File size: 3,339 Bytes
cd9bf63
 
7ee38f3
42fa051
 
7ee38f3
 
 
cd9bf63
7ee38f3
85c792b
1cff19b
7ee38f3
 
5794441
7ee38f3
 
 
5794441
7ee38f3
5794441
7ee38f3
837b0fd
 
 
 
7ee38f3
5794441
7ee38f3
5794441
 
 
 
7ee38f3
5794441
 
7ee38f3
5794441
 
 
 
7ee38f3
5794441
7ee38f3
 
 
5794441
7ee38f3
837b0fd
 
 
 
7ee38f3
 
 
 
5794441
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: apache-2.0
datasets:
- wenbopan/Fusang-v1
- wenbopan/OpenOrca-zh-20k
language:
- zh
- en
---

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/62cd3a3691d27e60db0698b0/2peGbPRq4jE-OoS9ndkOx.jpeg)

# Fi-9B

Fi-9B is an improved [Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K) with extensive instruction tuning on [Fusang-V1](https://huggingface.co/datasets/wenbopan/Fusang-v1). Compared to Yi-9B-200K, Fi-9B has gained greater capability in various downstream tasks and long-context modeling thanks to the large-scale synthetic data in Fusang-V1.

## Performance

Fi-9B enhances its ability compared to Yi-9B-200K in most dimensions, especially in long-range modeling and bilingual (English, Chinese) understanding. Fi is competitive among all open-sourced models at around 9B parameters. Fi-9B is good at both factual tasks and preferred by LLM-judges.

### Fact-based Evaluation (Open LLM Leaderboard)

| **Metric**     | **MMLU**  | GSM8K     | **HellaSwag** | **TruthfulQA** | **Arc** | **Winogrande** |
| -------------- | --------- | --------- | ------------- | -------------- | ----------- | -------------- |
| **Yi-9B-200K** | 65.73     | 50.49     | 56.72         | 33.80          | 69.25       | 71.67          |
| **Fi-9B-200K** | **68.80** | **63.08** | **57.28**     | **40.86**      | **72.58**   | 71.11          |

### Long-context Modeling (LongBench)

| **Name**       | **Average_zh** | **Average_en** | **Code Completion** |
|----------------|----------------|----------------|---------------------|
| **Yi-9B-200K** | 30.288         | 36.7071        | 72.2                |
| **Fi-9B-200K** | **41.092**     | **40.9536**    | 46.0                |

<details>
<summary>Score breakdown</summary>

| **Name**       | **Few-shot Learning_en** | **Synthetic Tasks_en** | **Single-Doc QA_en** | **Multi-Doc QA_en** | **Summarization_en** | **Few-shot Learning_zh** | **Synthetic Tasks_zh** | **Single-Doc QA_zh** | **Multi-Doc QA_zh** | **Summarization_zh** |
|----------------|--------------------------|------------------------|----------------------|---------------------|----------------------|--------------------------|------------------------|----------------------|---------------------|----------------------|
| **Yi-9B-200K** | 60.6                     | 22.8                   | 30.9                 | 38.9                | 25.8                 | 46.5                     | 28.0                   | 49.6                 | 17.7                | 9.7                  |
| **Fi-9B-200K** | **63.8**                 | **40.2**               | **36.2**             | 38.0                | **26.3**             | 30.0                     | **75.1**               | **55.6**             | **30.7**            | **14.1**             |

</details>

<!--### Performance on Preference TODO-->

### Bilingual Ability (CMMLU & MMLU)

| **Name**       | MMLU      | **CMMLU** |
| -------------- | --------- | --------- |
| **Yi-9B-200K** | 65.73     | 71.97     |
| **Fi-9B-200K** | **68.80** | **73.28** |


## Current Limitations

- This version of Fi-9B may not be able to stop generation in some scenarios. I will fix that soon.
- Compared to the original Yi-9B-200K, Fi-9B has degraded ability for code completion. This may be due to the lack of raw code data during instruction tuning.