manshoorai / README.md
rahiminia's picture
Update README.md
3ad441b verified
|
raw
history blame
2.85 kB
metadata
license: mit
language:
  - fa
base_model:
  - HooshvareLab/gpt2-fa
  - openai-community/gpt2
tags:
  - art
  - poetry
  - Persian
  - Farsi
  - شعر

ManshoorAI

Overview

This project fine-tunes GPT-2 to generate Persian neo-poetry inspired by the works of Sohrab Sepehri and Forough Farokhzad.
The model is a work in progress. I look forward to hear your thoughts.

Model Details

  • Base Model: GPT-2 (pretrained by OpenAI)
  • intermediate Model: HooshvareLab/gpt2-fa
  • Dataset: Curated poems from Sohrab Sepehri and Forough Farokhzad
  • Fine-Tuning: PEFT/LoRA
  • Language: Persian (Farsi)
  • Output: Generates poetry with free verse and metaphorical depth

Installation & Usage

You can load the model using the HuggingFace transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from hazm import Normalizer

model_name = "rahiminia/manshoorai"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def generate_poetry(prompt, max_length=30):
    prompt = Normalizer().normalize(prompt)
    generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
    output = generator(prompt, max_length=max_length)
    print(output['generated_text'])

print(generate_poetry("شب آرام و خاموش"))

Training Details

  • Tokenizer: Tokenizer with Byte Pair Encoding (BPE) from HooshvareLab/gpt2-fa
  • Training: Fine-tuned using PyTorch and the transformers library
  • Hyperparameters: Adjusted learning rate and weight decay

Sample Outputs

Prompt: "باران که می‌بارد"

Generated Text:

  • ManshoorAI
    باران که می‌بارد من، به باغ راه یافته بودم
    من این دشت را دیدم
    که پر از درخت است
    و در آن برگ هایم هیچ گونه سبز نیست
    
  • Base Model (GPT2-fa)
    باران که می‌بارد با خود بگوید که دیگر چه شده بود؟ اگر آن جوان از پشت نرده‌ها به پایین میرفت؛
    

Limitations & Biases

  • This is a work in progress, with many improvements yet to be made.
  • The model may occasionally generate repetitive or incoherent lines.
  • It does not strictly follow classical Persian poetry rules but leans towards free verse.
  • Biases in the training dataset might influence stylistic preferences.

Contributions & Feedback

If you use this model or have suggestions for improvement, feel free to open an issue or contribute via Hugging Face Spaces.

License

This model is released under the MIT License. Please ensure ethical use and proper attribution when sharing generated works.