File size: 1,681 Bytes
04a0243
 
93c89ae
 
 
 
 
04a0243
 
93c89ae
 
 
04a0243
93c89ae
 
04a0243
93c89ae
 
 
 
 
04a0243
93c89ae
04a0243
93c89ae
 
04a0243
93c89ae
 
04a0243
93c89ae
 
 
04a0243
93c89ae
 
 
 
 
 
 
 
 
 
 
 
 
04a0243
93c89ae
04a0243
93c89ae
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
library_name: transformers
license: apache-2.0
language:
- he
base_model:
- onlplab/alephbert-base
---

# Hebrew Punctuation model
## Introduction
This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions.

## Usage
For now this is the recommended way to use this model:

```
git lfs install 
git clone https://huggingface.co/verbit/hebrew_punctuation
cd hebrew_punctuation
```

Once you are in the folder you could do the following:

```
from transformers import BertTokenizer

from src.models import BertForPunctuation
from src.inference import get_prediction

model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation")
tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation")
model.eval()

text = ("讞讘专转 讜专讘讬讟 驻讬转讞讛 诪注专讻转 诇转诪诇讜诇 讛诪讘讜住住转 注诇 讘讬谞讛 诪诇讗讻讜转讬转 讜讙讜专诐 讗谞讜砖讬 讜砖讜拽讚转 注诇 转诪诇讜诇 注讚讜讬讜转 谞讬爪讜诇讬 砖讜讗讛 讗转 "
        "讛转讜爪讗讜转 讗驻砖专 诇专讗讜转 讻讘专 讘专砖转 讘讛谉 讞诇拽讬诐 诪注讚讜转讜 砖诇 讟讜讘讬讛 讘讬讬诇住拽讬 砖讛讬讛 诪驻拽讚 讙讚讜讚 讛驻专讟讬讝谞讬诐 讛讬讛讜讚讬诐 "
        "讘讘讬讬诇讜专讜住讬讛")
punct_text = get_prediction(
    model=model,
    text=text,
    tokenizer=tokenizer,
    backward_context=model.config.backward_context,
    forward_context=model.config.forward_context,
    return_prob=False
)
print(punct_text)
```

## Contact

For any questions or issues, please contact [email protected].