---
library_name: transformers
license: apache-2.0
language:
- he
base_model:
- onlplab/alephbert-base
---

# Hebrew Punctuation model
## Introduction
This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions.

## Install
```
git lfs install 
git clone https://huggingface.co/verbit/hebrew_punctuation
cd hebrew_punctuation
python -m venv .env
source .env/bin/activate
pip install -r requirements.txt
```
## Usage
For now this is the recommended way to use this model:

```
from transformers import BertTokenizer

from src.models import BertForPunctuation
from src.inference import get_prediction

model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation")
tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation")
model.eval()

text = """חברת ורביט פיתחה מערכת לתמלול המבוססת על בינה מלאכותית וגורם אנושי ושוקדת על תמלול עדויות ניצולי שואה 
את התוצאות אפשר לראות כבר ברשת בהן חלקים מעדותו של טוביה ביילסקי שהיה מפקד גדוד הפרטיזנים היהודים בביילורוסיה"""

punct_text = get_prediction(
    model=model,
    text=text,
    tokenizer=tokenizer,
    backward_context=model.config.backward_context,
    forward_context=model.config.forward_context,
)
print(punct_text)
```

## Contact

For any questions or issues, please contact research.team@verbit.ai.