|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
language: |
|
- he |
|
base_model: |
|
- onlplab/alephbert-base |
|
--- |
|
|
|
# Hebrew Punctuation model |
|
## Introduction |
|
This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions. |
|
|
|
## Usage |
|
For now this is the recommended way to use this model: |
|
|
|
``` |
|
git lfs install |
|
git clone https://huggingface.co/verbit/hebrew_punctuation |
|
cd hebrew_punctuation |
|
``` |
|
|
|
Once you are in the folder you could do the following: |
|
|
|
``` |
|
from transformers import BertTokenizer |
|
|
|
from src.models import BertForPunctuation |
|
from src.inference import get_prediction |
|
|
|
model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation") |
|
tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation") |
|
model.eval() |
|
|
|
text = ("讞讘专转 讜专讘讬讟 驻讬转讞讛 诪注专讻转 诇转诪诇讜诇 讛诪讘讜住住转 注诇 讘讬谞讛 诪诇讗讻讜转讬转 讜讙讜专诐 讗谞讜砖讬 讜砖讜拽讚转 注诇 转诪诇讜诇 注讚讜讬讜转 谞讬爪讜诇讬 砖讜讗讛 讗转 " |
|
"讛转讜爪讗讜转 讗驻砖专 诇专讗讜转 讻讘专 讘专砖转 讘讛谉 讞诇拽讬诐 诪注讚讜转讜 砖诇 讟讜讘讬讛 讘讬讬诇住拽讬 砖讛讬讛 诪驻拽讚 讙讚讜讚 讛驻专讟讬讝谞讬诐 讛讬讛讜讚讬诐 " |
|
"讘讘讬讬诇讜专讜住讬讛") |
|
punct_text = get_prediction( |
|
model=model, |
|
text=text, |
|
tokenizer=tokenizer, |
|
backward_context=model.config.backward_context, |
|
forward_context=model.config.forward_context, |
|
return_prob=False |
|
) |
|
print(punct_text) |
|
``` |
|
|
|
## Contact |
|
|
|
For any questions or issues, please contact [email protected]. |