Hebrew Language Model for Long Documents
State-of-the-art Longformer language model for Hebrew.
How to use
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('HeNLP/LongHeRo')
model = AutoModelForMaskedLM.from_pretrained('HeNLP/LongHeRo')
# Tokenization Example:
# Tokenizing
tokenized_string = tokenizer('ืฉืืื ืืืืื')
# Decoding
decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True)
Citing
If you use LongHeRo in your research, please cite HeRo: RoBERTa and Longformer Hebrew Language Models.
@article{shalumov2023hero,
title={HeRo: RoBERTa and Longformer Hebrew Language Models},
author={Vitaly Shalumov and Harel Haskey},
year={2023},
journal={arXiv:2304.11077},
}
- Downloads last month
- 31
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.