--- library_name: transformers license: apache-2.0 language: - he base_model: - onlplab/alephbert-base --- # Hebrew Punctuation model ## Introduction This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions. ## Usage For now this is the recommended way to use this model: ``` git lfs install git clone https://huggingface.co/verbit/hebrew_punctuation cd hebrew_punctuation ``` Once you are in the folder you could do the following: ``` from transformers import BertTokenizer from src.models import BertForPunctuation from src.inference import get_prediction model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation") tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation") model.eval() text = ("חברת ורביט פיתחה מערכת לתמלול המבוססת על בינה מלאכותית וגורם אנושי ושוקדת על תמלול עדויות ניצולי שואה את " "התוצאות אפשר לראות כבר ברשת בהן חלקים מעדותו של טוביה ביילסקי שהיה מפקד גדוד הפרטיזנים היהודים " "בביילורוסיה") punct_text = get_prediction( model=model, text=text, tokenizer=tokenizer, backward_context=model.config.backward_context, forward_context=model.config.forward_context, return_prob=False ) print(punct_text) ``` ## Contact For any questions or issues, please contact research.team@verbit.ai.