|
--- |
|
inference: false |
|
language: |
|
- ja |
|
- en |
|
- de |
|
- is |
|
- zh |
|
- cs |
|
--- |
|
# New Version has been released. |
|
|
|
2024/03/04 |
|
[webbigdata/C3TR-Adapter](https://huggingface.co/webbigdata/C3TR-Adapter) |
|
Memory GPU requirement has increased to 8.1 GB. However, it is possible to run it with the free version of Colab and the performance is much improved! |
|
|
|
2023/10/21 |
|
[ALMA-7B-Ja-V2](https://huggingface.co/webbigdata/ALMA-7B-Ja-V2) |
|
Overall performance has been raised. |
|
|
|
Below is a description of the old version. We urge you to try the newer version above. |
|
|
|
# webbigdata/ALMA-7B-Ja |
|
|
|
ALMA-7B-Ja(13.3GB) is a machine translation model that uses ALMA's learning method to translate Japanese to English. |
|
The [original ALMA-7B (26.95GB)](https://huggingface.co/haoranxu/ALMA-7B) supports English and Russian(ru) translation. This model supports Japanese(ja) and English translations instead of Russian. |
|
|
|
Like the original model, This model has been verified that it also has a translation ability between the following languages, but if you want the translation function for these languages, it is better to use the original [ALMA-13B model](https://huggingface.co/haoranxu/ALMA-13B). |
|
|
|
- German(de) and English(en) |
|
- Chinese(zh) and English(en) |
|
- Icelandic(is) and English(en) |
|
- Czech(cs) and English(en) |
|
|
|
Translating from English (en→xx) BLEU/COMET |
|
Models | de | cs | is | zh | ru/jp | Avg. | |
|
|----------------|--------|--------|--------|--------|--------|--------| |
|
NLLB-54B | 34.50/86.45 | 37.60/90.15 | 24.15/81.76 | 27.38/78.91 | 30.96/87.92 | 30.92/85.04 | |
|
GPT-3.5-D | 31.80/85.61 | 31.30/88.57 | 15.90/76.28 | 38.30/85.76 | 27.50/86.74 | 28.96/84.59 | |
|
ALMA-7B(Original)| 30.31/85.59 | 29.88/89.10 | 25.71/85.52 | 36.87/85.11 | 27.13/86.98 | 29.89/86.49 | |
|
ALMA-7B-Ja(Ours) | 23.70/82.04 | 18.58/81.36 | 12.20/71.59 | 29.06/82.45 | 14.82/85.40 | 19.67/80.57 | |
|
|
|
Translating to English (xx→en) BLEU/COMET |
|
Models | de | cs | is | zh | ru/jp | Avg. | |
|
|----------------|--------|--------|--------|--------|--------|--------| |
|
NLLB-54B | 26.89/78.94 | 39.11/80.13 | 23.09/71.66 | 16.56/70.70 | 39.11/81.88 | 28.95/76.66 | |
|
GPT-3.5-D | 30.90/84.79 | 44.50/86.16 | 31.90/82.13 | 25.00/81.62 | 38.50/84.80 | 34.16/83.90 | |
|
ALMA-7B(Original)| 30.26/84.00 | 43.91/85.86 | 35.97/86.03 | 23.75/79.85 | 39.37/84.58 | 34.55/84.02 | |
|
ALMA-7B-Ja(Ours) | 26.41/83.13 | 34.39/83.50 | 24.77/81.12 | 20.60/78.54 | 15.57/78.61 | 24.35/81.76 | |
|
|
|
[Sample Code For Free Colab](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_Free_Colab_sample.ipynb) |
|
|
|
|
|
|
|
## Other Version |
|
|
|
|
|
|
|
|
|
### webbigdata-ALMA-7B-Ja-gguf |
|
|
|
mmnga made llama.cpp(gguf) version [webbigdata-ALMA-7B-Ja-gguf](https://huggingface.co/mmnga/webbigdata-ALMA-7B-Ja-gguf). Thank you! |
|
llama.cpp is a tool used primarily on Macs, and gguf is its latest version format. It can be used without gpu. |
|
|
|
[ALMA-7B-Ja-gguf Free Colab sample](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_gguf_Free_Colab_sample.ipynb) |
|
|
|
|
|
### ALMA-7B-Ja-GPTQ-Ja-En |
|
GPTQ is quantized(reduce the size of the model) method and ALMA-7B-Ja-GPTQ has GPTQ quantized version that reduces model size(3.9GB) and memory usage. |
|
But the performance is probably lower. And translation ability for languages other than Japanese and English has deteriorated significantly. |
|
|
|
[Sample Code For Free Colab webbigdata/ALMA-7B-Ja-GPTQ-Ja-En](https://huggingface.co/webbigdata/ALMA-7B-Ja-GPTQ-Ja-En) |
|
|
|
If you want to translate the entire file at once, try Colab below. |
|
[ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample.ipynb) |
|
|
|
|
|
|
|
|
|
**ALMA** (**A**dvanced **L**anguage **M**odel-based tr**A**nslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance. |
|
Please find more details in their [paper](https://arxiv.org/abs/2309.11674). |
|
``` |
|
@misc{xu2023paradigm, |
|
title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models}, |
|
author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla}, |
|
year={2023}, |
|
eprint={2309.11674}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
|
|
## about this work |
|
- **This work was done by :** [webbigdata](https://webbigdata.jp/). |