File size: 2,612 Bytes
e88c193 ca13cba bb548ac e88c193 71adb80 4c1e43e 455b203 bb548ac 5127871 d812072 d203043 5127871 bb548ac 2b6a9d8 bb548ac 77da23e bb548ac 2b6a9d8 bb548ac 2b6a9d8 bb548ac 5127871 bb548ac 5127871 bb548ac 5127871 bb548ac 5127871 bb548ac 5127871 bb548ac 264e1ba 1f5f0c4 455b203 bb548ac 455b203 bb548ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
language:
- ru
tags:
- mbart
inference:
parameters:
no_repeat_ngram_size: 4,
num_beams: 5
datasets:
- IlyaGusev/gazeta
- samsum
- samsum_(translated_into_Russian)
widget:
- text: >
Джефф: Могу ли я обучить модель 🤗 Transformers на Amazon SageMaker?
Филипп: Конечно, вы можете использовать новый контейнер для глубокого
обучения HuggingFace.
Джефф: Хорошо.
Джефф: и как я могу начать?
Джефф: где я могу найти документацию?
Филипп: ок, ок, здесь можно найти все:
https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face
model-index:
- name: mbart_ruDialogSum
results:
- task:
name: Abstractive Dialogue Summarization
type: abstractive-text-summarization
dataset:
name: SAMSum Corpus (translated to Russian)
type: samsum
metrics:
- name: Validation ROGUE-1
type: rogue-1
value: 34.5
- name: Validation ROGUE-L
type: rogue-l
value: 33
- name: Test ROGUE-1
type: rogue-1
value: 31
- name: Test ROGUE-L
type: rogue-l
value: 28
license: cc
---
### 📝 Description
MBart for Russian summarization fine-tuned for **dialogues** summarization.
This model was firstly fine-tuned by [Ilya Gusev](https://hf.co/IlyaGusev) on [Gazeta dataset](https://huggingface.co/datasets/IlyaGusev/gazeta). We have **fine tuned** that model on [SamSum dataset](https://huggingface.co/datasets/samsum) **translated to Russian** using GoogleTranslateAPI
🤗 Moreover! We have implemented a **! telegram bot [@summarization_bot](https://t.me/summarization_bot) !** with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages! 🤗
### ❓ How to use with code
```python
from transformers import MBartTokenizer, MBartForConditionalGeneration
# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()
article_text = "..."
input_ids = tokenizer(
[article_text],
max_length=600,
padding="max_length",
truncation=True,
return_tensors="pt",
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
top_k=0,
num_beams=3,
no_repeat_ngram_size=3
)[0]
summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
``` |