BERT-BR
Image generated by ChatGPT with DALL-E from OpenAI.Model description
BERT-BR is a BERT model pre-trained from scratch on a dataset of literary book reviews in Brazilian Portuguese. The model is specifically designed for understanding the context and sentiment of book reviews in Portuguese. BERT-BR features 6 layers, 4 attention heads, and an embedding dimension of 768.
Training data
The BERT-BR model was pre-trained on a dataset of literary book reviews in Brazilian Portuguese. The dataset comprises a diverse range of book genres and review sentiments, making the model suitable for various book-related NLP tasks in Portuguese.
Usage ideas
- Sentiment analysis on book reviews in Portuguese
- Book recommendation systems in Portuguese
- Text classification for book genres in Portuguese
- Named entity recognition in book-related contexts in Portuguese
- Aspect extraction in book-related contexts in Portuguese
- Text generation for book summaries in Portuguese
Limitations and bias
As the BERT-BR model was pre-trained on literary book reviews in Brazilian Portuguese, it may not perform as well on other types of text or reviews in different languages. Additionally, the model may inherit certain biases from the training data, which could affect its predictions or embeddings. The tokenizer is based on the BERTimbau tokenizer, which was specifically designed for Brazilian Portuguese text, so it might not work well with other languages or Portuguese variants.
Framework versions
- Transformers 4.21.3
- TensorFlow 2.9.1
- Datasets 2.7.0
- Tokenizers 0.12.1
- Downloads last month
- 7