kalpeshk2011
commited on
Commit
•
75958be
1
Parent(s):
39161e8
Update README.md
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ DIPPER ("**Di**scourse **P**ara**p**hras**er**") is a 11B parameter paraphrase g
|
|
34 |
|
35 |
We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser.
|
36 |
|
37 |
-
## Using DIPPER
|
38 |
|
39 |
Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper
|
40 |
|
@@ -72,8 +72,7 @@ class DipperParaphraser(object):
|
|
72 |
|
73 |
for sent_idx in range(0, len(sentences), sent_interval):
|
74 |
curr_sent_window = " ".join(sentences[sent_idx:sent_idx + sent_interval])
|
75 |
-
final_input_text = f"lexical = {lex_code}, order = {order_code}"
|
76 |
-
final_input_text += f" <sent> {curr_sent_window} </sent>"
|
77 |
|
78 |
final_input = self.tokenizer([final_input_text], return_tensors="pt")
|
79 |
final_input = {k: v.cuda() for k, v in final_input.items()}
|
|
|
34 |
|
35 |
We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser.
|
36 |
|
37 |
+
## Using DIPPER (no-context)
|
38 |
|
39 |
Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper
|
40 |
|
|
|
72 |
|
73 |
for sent_idx in range(0, len(sentences), sent_interval):
|
74 |
curr_sent_window = " ".join(sentences[sent_idx:sent_idx + sent_interval])
|
75 |
+
final_input_text = f"lexical = {lex_code}, order = {order_code} {curr_sent_window}"
|
|
|
76 |
|
77 |
final_input = self.tokenizer([final_input_text], return_tensors="pt")
|
78 |
final_input = {k: v.cuda() for k, v in final_input.items()}
|