File size: 2,846 Bytes
3ad441b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: mit
language:
- fa
base_model:
- HooshvareLab/gpt2-fa
- openai-community/gpt2
tags:
- art
- poetry
- Persian
- Farsi
- شعر
---
# ManshoorAI

## Overview
This project fine-tunes GPT-2 to generate Persian neo-poetry inspired by the works of Sohrab Sepehri and Forough Farokhzad.  
The model is a work in progress. I look forward to hear your thoughts.

## Model Details
- **Base Model**: GPT-2 (pretrained by OpenAI)
- **intermediate Model**: [HooshvareLab/gpt2-fa](https://huggingface.co/HooshvareLab/gpt2-fa)
- **Dataset**: Curated poems from Sohrab Sepehri and Forough Farokhzad
- **Fine-Tuning**: PEFT/LoRA
- **Language**: Persian (Farsi)
- **Output**: Generates poetry with free verse and metaphorical depth

## Installation & Usage
You can load the model using the HuggingFace `transformers` library:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from hazm import Normalizer

model_name = "rahiminia/manshoorai"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def generate_poetry(prompt, max_length=30):
    prompt = Normalizer().normalize(prompt)
    generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
    output = generator(prompt, max_length=max_length)
    print(output['generated_text'])

print(generate_poetry("شب آرام و خاموش"))
```

## Training Details
- **Tokenizer**: Tokenizer with Byte Pair Encoding (BPE) from [HooshvareLab/gpt2-fa](https://huggingface.co/HooshvareLab/gpt2-fa)
- **Training**: Fine-tuned using PyTorch and the `transformers` library
- **Hyperparameters**: Adjusted learning rate and weight decay

## Sample Outputs
**Prompt**: "باران که می‌بارد"

**Generated Text**:
- ManshoorAI
    ```
    باران که می‌بارد من، به باغ راه یافته بودم
    من این دشت را دیدم
    که پر از درخت است
    و در آن برگ هایم هیچ گونه سبز نیست
    ```
- Base Model (GPT2-fa)
    ```
    باران که می‌بارد با خود بگوید که دیگر چه شده بود؟ اگر آن جوان از پشت نرده‌ها به پایین میرفت؛
    ```

## Limitations & Biases
- This is a work in progress, with many improvements yet to be made.
- The model may occasionally generate repetitive or incoherent lines.
- It does not strictly follow classical Persian poetry rules but leans towards free verse.
- Biases in the training dataset might influence stylistic preferences.

## Contributions & Feedback
If you use this model or have suggestions for improvement, feel free to open an issue or contribute via Hugging Face Spaces.

## License
This model is released under the MIT License. Please ensure ethical use and proper attribution when sharing generated works.