File size: 2,547 Bytes
43251d4
 
b32ffdd
 
c10009b
 
7957232
7216206
2ac1376
b32ffdd
252c0c9
b32ffdd
43251d4
 
899b715
 
 
 
 
3f23add
 
 
43251d4
 
 
 
 
8bf9c8e
43251d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
252c0c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43251d4
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: cc-by-nc-nd-4.0
inference:
  parameters:
    num_beams: 3
    num_beam_groups: 3
    num_return_sequences: 1
    repetition_penalty: 10.0
    diversity_penalty: 3.01
    no_repeat_ngram_size: 2
    temperature: 0.8
    max_length: 128
widget:
- text: >-
    Data scientists need to be able to communicate their findings to others in a clear and concise way.
  example_title: Data scientists
- text: >-
    Search engine optimization (SEO) is the practice of getting targeted traffic to a website from a search engine's organic rankings.
  example_title: SEO
- text: >-
    By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs.
  example_title: Fine Tuning
---


# Text Rewriter Paraphraser

This repository contains a fine-tuned text-rewriting model based on the T5-Base with 223M parameters. 

## Key Features:

* **Fine-tuned on t5-base:** Leverages the power of a pre-trained text-to-text transfer model for effective paraphrasing.
* **Large Dataset (430k examples):** Trained on a comprehensive dataset combining three open-source sources and cleaned using various techniques for optimal performance.
* **High Quality Paraphrases:** Generates paraphrases that significantly alter sentence structure while maintaining accuracy and factual correctness.
* **Non-AI Detectable:** Aims to produce paraphrases that appear natural and indistinguishable from human-written text.

**Model Performance:**

* Train Loss: 1.0645
* Validation Loss: 0.8761

## Getting Started:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Replace 'YOUR_TOKEN' with your actual Hugging Face access token
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN')
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN')
```
```python
text = "Data science is a field that deals with extracting knowledge and insights from data. "

inputs = tokenizer(text, return_tensors="pt")

output = model.generate(**inputs, max_length=50)

print(tokenizer.decode(output[0]))
```

**Disclaimer:**

This model is intended for research and creative writing purposes. It is essential to use the paraphrased text responsibly and ethically, with proper attribution of the original source. 

**Further Development:**

(Mention any ongoing development or areas for future improvement in Discussions.)