kalpeshk2011
/

dipper-paraphraser-xxl-no-context

Text Generation

text2text-generation

paraphrase-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kalpeshk2011 commited on May 18, 2023

Commit

75958be

•

1 Parent(s): 39161e8

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ DIPPER ("**Di**scourse **P**ara**p**hras**er**") is a 11B parameter paraphrase g
 We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser.
-## Using DIPPER
 Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper
@@ -72,8 +72,7 @@ class DipperParaphraser(object):
         for sent_idx in range(0, len(sentences), sent_interval):
             curr_sent_window = " ".join(sentences[sent_idx:sent_idx + sent_interval])
-            final_input_text = f"lexical = {lex_code}, order = {order_code}"
-            final_input_text += f" <sent> {curr_sent_window} </sent>"
             final_input = self.tokenizer([final_input_text], return_tensors="pt")
             final_input = {k: v.cuda() for k, v in final_input.items()}

 We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser.
+## Using DIPPER (no-context)
 Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper
         for sent_idx in range(0, len(sentences), sent_interval):
             curr_sent_window = " ".join(sentences[sent_idx:sent_idx + sent_interval])
+            final_input_text = f"lexical = {lex_code}, order = {order_code} {curr_sent_window}"
             final_input = self.tokenizer([final_input_text], return_tensors="pt")
             final_input = {k: v.cuda() for k, v in final_input.items()}