lightblue
/

suzume-llama-3-8B-japanese

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ptrdvn commited on Apr 25

Commit

80e1691

•

1 Parent(s): 3439dd3

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -53,8 +53,20 @@ for output in outputs:
 We find that this is the best performing model in the 7/8B class of LLMs on a multitude of Japanese language benchmarks.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/2obyDbrjiNV3PGfwom6EI.png)
 # Training data
 We train on three sources of data to create this model

 We find that this is the best performing model in the 7/8B class of LLMs on a multitude of Japanese language benchmarks.
+We calculate our Japanese evaluation scores using our [lightblue-tech/japanese_llm_eval](https://github.com/lightblue-tech/japanese_llm_eval) repo.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/2obyDbrjiNV3PGfwom6EI.png)
+We also compare our Japanese model to our multilingual model using our [multilingual_mt_bench](https://github.com/Peter-Devine/multilingual_mt_bench/tree/main/fastchat/llm_judge) repo.
+|                 | **lightblue/suzume-llama-3-8B-japanese** | **lightblue/suzume-llama-3-8B-multilingual** | **Nexusflow/Starling-LM-7B-beta** | **gpt-3.5-turbo** |
+|-----------------|------------------------------------------|----------------------------------------------|-----------------------------------|-------------------|
+| **Japanese 🇯🇵** | 6.24                                     | 6.56                                         | 6.22                              | 7.84              |
+Here, we find that our multilingual model outperforms our Japanese model on the Japanese MT-Bench benchmark, indicating that our multilingual model was able to generalize better to the Japanese MT-Bench benchmark from training on more data, even if that added data was not in Japanese.
+Note - the discrepancy between the MT-Bench scores of the first and second evaluation of `lightblue/suzume-llama-3-8B-japanese` are due to the difference in system message of the two evaluation harnesses. The former's system message is in Japanese while the latter's is in English.
 # Training data
 We train on three sources of data to create this model