Tuti 🦜

This is a Gemma 2 9b, fined tuned using Unsloth's 4-bit quantization and LORA (QLORA), on Persian literature datasets I curated/created or found.

Use cases and datasets

Word IPA Detection

I have fined tuned this model with QLORA and only uploaded the LORA adapter, so it could be used like this:

# pip install unsloth
from unsloth import FastLanguageModel
from transformers import TextStreamer

model_name = "cnababaie/tuti"
max_seq_length = 4096  # Adjust as needed
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)
alpaca_prompt_template = """### Instruction:
{}

### Input:
{}

### Response:
{}"""
inputs = tokenizer(
[
    alpaca_prompt_template.format(
        "IPA این کلمه چیست؟", # instruction
        "جوینده",
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

This will correctly output IPA as "/d͡ʒuːjænde/ (juyande)".

IPA Sources

  • IPA-dict: Monolingual wordlists with pronunciation information in IPA
  • Wiktionary: The Persian corpus don't contain IPA but the English one(which contains many words and phrases in other than English) are a lot of Persian words with their IPA

Persian Text Romanization

inputs = tokenizer(
[
    alpaca_prompt_template.format(
        "این متن چه تلفظی داره؟", # instruction
        "خاک به خاطر بارش زیاد باران گل شد.",
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

This will output exact pronunciation as "Xāk be xāter-e bāreš-e ziyād-e bārān gel šod.".

Romanization Sources

Persian Poem Translation

inputs = tokenizer(
[
    alpaca_prompt_template.format(
        "ترجمه", # instruction
        "برخیز بتا بیا ز بهر دل ما\r\nحل کن به جمال خویشتن مشکل ما\r\nیک کوزه شراب تا به هم نوش کن\r\nزآن پیش که کوزه‌ها کنند از گل ما",
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

This will output rhymed poetry with the original poem content:

"Arise, O idol, for our heart's sake, Solve our troubles with your beauty's make. One pot of wine, let's drink it all, Before they make pots from our clay's fall.".

Poem Translation Sources

  • Created list of random poems from Ganjoor and translation text pair
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cnababaie/tuti

Base model

google/gemma-2-9b
Finetuned
(229)
this model