File size: 1,312 Bytes
e5f8590
 
 
 
 
 
 
 
 
 
43a0492
764e561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language: 
  - zh
license: apache-2.0
datasets:
- TencentKuaibao
metrics:
- bleu
- rouge
---
## 模型
- 基于中文[MengziT5](https://huggingface.co/Langboat/mengzi-t5-base)的新闻评论生成模型
- 数据集来源于论文[《Coherent Comment Generation for Chinese Articles with a Graph-to-Sequence Model》](https://github.com/lancopku/Graph-to-seq-comment-generation)

## 生成评论
- 在线API只能生成一种评论,模型通过设置model.generate()参数是可以生成多种评论的

```Python

t5_tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")

model = T5ForConditionalGeneration.from_pretrained("wawaup/MengziT5-Comment")

def generate_comment(input_ids,cnt_num):
    outputs = model.generate(input_ids,
                            max_length=128,
                            do_sample=True,
                            temperature=0.9,
                            early_stopping=True,
                            repetition_penalty=10.0,
                            top_p=0.5,
                            num_return_sequences=cnt_num)
    print(outputs) 
    preds_cleaned = [t5_tokenizer.decode(ids, skip_special_tokens=True, 
                            clean_up_tokenization_spaces=True) for ids in outputs]
    print(preds_cleaned)
    return preds_cleaned
```