verbit
/

hebrew_punctuation

Inference Endpoints

Model card Files Files and versions Community

hebrew_punctuation / README.md

verbit-research's picture

verbit-research

Update README.md (#2)

93c89ae verified 4 months ago

|

1.68 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- he
	base_model:
	- onlplab/alephbert-base
	---

	# Hebrew Punctuation model
	## Introduction
	This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions.

	## Usage
	For now this is the recommended way to use this model:

	```
	git lfs install
	git clone https://huggingface.co/verbit/hebrew_punctuation
	cd hebrew_punctuation
	```

	Once you are in the folder you could do the following:

	```
	from transformers import BertTokenizer

	from src.models import BertForPunctuation
	from src.inference import get_prediction

	model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation")
	tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation")
	model.eval()

	text = ("חברת ורביט פיתחה מערכת לתמלול המבוססת על בינה מלאכותית וגורם אנושי ושוקדת על תמלול עדויות ניצולי שואה את "
	"התוצאות אפשר לראות כבר ברשת בהן חלקים מעדותו של טוביה ביילסקי שהיה מפקד גדוד הפרטיזנים היהודים "
	"בביילורוסיה")
	punct_text = get_prediction(
	model=model,
	text=text,
	tokenizer=tokenizer,
	backward_context=model.config.backward_context,
	forward_context=model.config.forward_context,
	return_prob=False
	)
	print(punct_text)
	```

	## Contact

	For any questions or issues, please contact [email protected].