Model Card for gemma-2-2b-jpn-it-translate-gguf

gemma-2-2b-jpn-it-translate-ggufは、日英・英日翻訳タスクに特化したSLM(Small Language Model)です。パラメーター数は20億(2B)ですが、分野によっては従来の70億(7B)モデルに迫るレベルの翻訳品質を提供します。ファイルサイズが約2GB程度であるため比較的小さいため、高速な実行が可能です。
gemma-2-2b-jpn-it-translate-gguf is an SLM (Small Language Model) specialized for Japanese-English and English-Japanese translation tasks. Despite having only 2 billion parameters (2B), it provides translation quality approaching that of conventional 7 billion (7B) parameter models in some kind of text. With a relatively small file size of about 2GB, it enables fast execution.

文単位で翻訳する事を学習しているため、改行を含む長文を一度に渡すと品質が低下します。
長文を翻訳する際は文単位で区切る前処理をしてからモデルに与えてください。
Because the model is trained to translate sentence by sentence, passing a long sentence with line breaks at once will result in a decrease in quality.
When translating a long sentence, please pre-process it by dividing it into sentences before feeding it to the model.

Sample Colab Script

Google アカウントをお持ちの方は以下のリンク先でOpen in Colabボタンを押す事で試す事ができます
If you have a Google account, you can try it out by clicking the Open in Colab button at the link below.
Colab sample

sample for windows

ColabのCPUは遅いのでご自身のパソコンでllama.cppをコンパイルして動かす方が快適です。
Colab's CPU is slow, so it is more convenient to compile and run llama.cpp on your own computer.

クライアント/サーバー形態で動作させるサンプルは以下です。
Below is a sample of how it works in client/server mode.

start server.

.\llama.cpp\build\bin\Release\llama-server -m .\gemma-2-2b-jpn-it-translate-Q4_K_L.gguf -c 2048 --override-kv tokenizer.ggml.add_bos_token=bool:false

bosトークン重複を回避するために、--override-kv tokenizer.ggml.add_bos_token=bool:false オプションを必ず指定してください。
Be sure --override-kv tokenizer.ggml.add_bos_token=bool:false options for avoid dup bos token.

pip install -U transformers
pip install requests
import transformers
import requests
import json
from transformers import AutoTokenizer

system_prompt = "You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.\n\n"
instruct = "Translate Japanese to English.\nWhen translating, please use the following hints:\n[writing_style: casual]"

initial_messages  = [
    {"role": "user", "content": system_prompt + instruct},
    {"role": "assistant", "content": "OK"}
]

message_list = [
 "and I was a little bit nervous, too, speaking to a Japanese audience really for the first time, certainly since I left the White House. ",
 "And I had a very good interpreter, and if you have ever made a speech in Japan in English, it takes a lot longer to say it in Japanese. ",
 "I decided I would break the ice by telling the shortest joke that I knew.",
 "It was not the best joke I knew, but it was the shortest joke I knew, left over from my governor's campaign years before.",
 "So I told my joke, the interpreter told the joke, and the audience just collapsed in laughter. ",
 "I never got a better response from any audience in my life. ",
 "So I could not wait to get through the speech and talk to the interpreter and ask him,"
 "\"How did you tell my joke?\"",
 "He was very evasive. He would not tell me how he told it.",
 "I insisted, and he finally ducked his head and said,",
 "\"I told the audience, 'President Carter told a funny story. Everybody, laugh.'\""
]

tokenizer = AutoTokenizer.from_pretrained("webbigdata/gemma-2-2b-jpn-it-translate")

if __name__ == "__main__":
    messages = initial_messages.copy()
    for i in range(len(message_list)):
        messages.append({"role": "user",  "content": message_list[i]})
        print("user: " + message_list[i])


        # Transformersのトークナイザーを使いたくない場合は、Colabのサンプルが手でプロンプトテンプレートを書いてるのでそちらを参考してください
        # If you don’t want to use the Transformers tokenizer, you can use the Colab example to manually write the prompt template.  
        prompt = tokenizer.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=False
        )

        payload = {
            "prompt": prompt,
            "n_predict": 1200
        }
    
        # Define the URL and headers for the POST request
        url = "http://localhost:8080/completion"
        headers = {
            "Content-Type": "application/json"
        }

        # Send the POST request and capture the response
        response = requests.post(url, headers=headers, data=json.dumps(payload))
        # print(response)
        # print( response.json() )


        # Check if the request was successful
        if response.status_code != 200:
            print(f"Error: {response.text}")

        # Parse the response JSON
        response_data = response.json()

        # Extract the 'content' field from the response
        response_content = response_data.get('content', '').strip()

        print("assistant: " + response_content)
        messages.append({"role": "assistant",  "content": response_content})
        
        # Max 6 message, you need more memory for more massages.
        if len(messages) > 8:  # 2 (initial) + 6 (new) = 8
            messages = initial_messages + messages[-6:]

result

user: and I was a little bit nervous, too, speaking to a Japanese audience really for the first time, certainly since I left the White House.
assistant: そして、私もちょっと緊張していました。白宮を離れてから、日本人の前で初めて、本当の意味で話しているというのは 、かなり初でした。
user: And I had a very good interpreter, and if you have ever made a speech in Japan in English, it takes a lot longer to say it in Japanese.
assistant: そして、私はとても優れた通訳がいて、もしあなたが英語で日本で講演をしたりしたことがあるのなら、日本語で話すの にはるかに長くなります。
user: I decided I would break the ice by telling the shortest joke that I knew.
assistant: そこで、最も短いジョークを話して、最初のうちは気を緩ませようと決めたのです。
user: It was not the best joke I knew, but it was the shortest joke I knew, left over from my governor's campaign years before.
assistant: 一番のジョークではないですが、私の知っていた短いジョーク、それまで governors の選挙でのキャンペーン中にいたものです。
user: So I told my joke, the interpreter told the joke, and the audience just collapsed in laughter.
assistant: それで、私が私のジョークを言ったら、通訳が私のジョークを、そしてオーディエンスはただただ笑い転げました。
user: I never got a better response from any audience in my life.
assistant: 私はこのジョークは、今までに、これ以上いい反応をもらったことがありませんでした。
user: So I could not wait to get through the speech and talk to the interpreter and ask him,"How did you tell my joke?"
assistant: それで、私は、その演説を終わらせて、通訳と話すのが待ちきれないで、その通訳に尋ねたいと思っていました、「どの ように私のジョークを伝えるの?」
user: He was very evasive. He would not tell me how he told it.
assistant: 彼はとても避けていました。どうしたのかと教えてくれません。
user: I insisted, and he finally ducked his head and said,
assistant: 断るまいと、彼はついに首をかしげて、「
user: "I told the audience, 'President Carter told a funny story. Everybody, laugh.'"
assistant: 「司会者から、カーター大統領が面白い話をした。皆さん、笑って。」

ベンチマーク結果 Benchmark results

Q4KL

filename direction spBLEU chrF2++ comet xlcomet
flores200v1 enja 21.94 30.8 0.8714 0.7496
flores200v1 jaen 21.28 52.2 0.8577 0.8965
wmt23 jaen 14.46 41.5 0.7859 0.8353
wmt20 enja 14.15 25.2 0.8519 0.6701
wmt23 enja 13.96 25.4 0.8339 0.7500
Business jaen 19.14 42.7 0.8084 0.8309
wmt20 jaen 13.23 41.8 0.7807 0.7187
Business enja 17.53 33.7 0.8809 0.8604

謝辞 Acknowledgements

BibTeX:

@misc{dahara2024imatrix,
  author       = {dahara1@webbigdata},
  title        = {gemma-2-2b-jpn-it-translate: A translation task-specific gguf model based on gemma-2-2b-jpn-it},
  year         = {2024},
  howpublished = {\url{https://huggingface.co/webbigdata/gemma-2-2b-jpn-it-translate-gguf/}},
  note         = {Accessed: 2024-10-10},
  abstract     = {This model was developed to verify how much Japanese-English and English-Japanese translation performance can be improved with the 2B gguf model.},
}
Downloads last month
3,566
GGUF
Model size
2.61B params
Architecture
gemma2

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference API
Unable to determine this model's library. Check the docs .

Model tree for webbigdata/gemma-2-2b-jpn-it-translate-gguf

Base model

google/gemma-2-2b
Quantized
(20)
this model