|
--- |
|
license: cc-by-nc-nd-4.0 |
|
inference: |
|
parameters: |
|
num_beams: 3 |
|
num_beam_groups: 3 |
|
num_return_sequences: 1 |
|
repetition_penalty: 10.0 |
|
diversity_penalty: 3.01 |
|
no_repeat_ngram_size: 2 |
|
temperature: 0.8 |
|
max_length: 128 |
|
widget: |
|
- text: >- |
|
Data scientists need to be able to communicate their findings to others in a clear and concise way. |
|
example_title: Data scientists |
|
- text: >- |
|
Search engine optimization (SEO) is the practice of getting targeted traffic to a website from a search engine's organic rankings. |
|
example_title: SEO |
|
- text: >- |
|
By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs. |
|
example_title: Fine Tuning |
|
--- |
|
|
|
|
|
# Text Rewriter Paraphraser |
|
|
|
This repository contains a fine-tuned text-rewriting model based on the T5-Base with 223M parameters. |
|
|
|
## Key Features: |
|
|
|
* **Fine-tuned on t5-base:** Leverages the power of a pre-trained text-to-text transfer model for effective paraphrasing. |
|
* **Large Dataset (430k examples):** Trained on a comprehensive dataset combining three open-source sources and cleaned using various techniques for optimal performance. |
|
* **High Quality Paraphrases:** Generates paraphrases that significantly alter sentence structure while maintaining accuracy and factual correctness. |
|
* **Non-AI Detectable:** Aims to produce paraphrases that appear natural and indistinguishable from human-written text. |
|
|
|
**Model Performance:** |
|
|
|
* Train Loss: 1.0645 |
|
* Validation Loss: 0.8761 |
|
|
|
## Getting Started: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
# Replace 'YOUR_TOKEN' with your actual Hugging Face access token |
|
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN') |
|
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN') |
|
``` |
|
```python |
|
text = "Data science is a field that deals with extracting knowledge and insights from data. " |
|
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
output = model.generate(**inputs, max_length=50) |
|
|
|
print(tokenizer.decode(output[0])) |
|
``` |
|
|
|
**Disclaimer:** |
|
|
|
This model is intended for research and creative writing purposes. It is essential to use the paraphrased text responsibly and ethically, with proper attribution of the original source. |
|
|
|
**Further Development:** |
|
|
|
(Mention any ongoing development or areas for future improvement in Discussions.) |
|
|