BERT-BR

Image generated by ChatGPT with DALL-E from OpenAI.

Model description

BERT-BR is a BERT model pre-trained from scratch on a dataset of literary book reviews in Brazilian Portuguese. The model is specifically designed for understanding the context and sentiment of book reviews in Portuguese. BERT-BR features 6 layers, 4 attention heads, and an embedding dimension of 768.

Training data

The BERT-BR model was pre-trained on a dataset of literary book reviews in Brazilian Portuguese. The dataset comprises a diverse range of book genres and review sentiments, making the model suitable for various book-related NLP tasks in Portuguese.

Usage ideas

Sentiment analysis on book reviews in Portuguese
Book recommendation systems in Portuguese
Text classification for book genres in Portuguese
Named entity recognition in book-related contexts in Portuguese
Aspect extraction in book-related contexts in Portuguese
Text generation for book summaries in Portuguese

Limitations and bias

As the BERT-BR model was pre-trained on literary book reviews in Brazilian Portuguese, it may not perform as well on other types of text or reviews in different languages. Additionally, the model may inherit certain biases from the training data, which could affect its predictions or embeddings. The tokenizer is based on the BERTimbau tokenizer, which was specifically designed for Brazilian Portuguese text, so it might not work well with other languages or Portuguese variants.

Framework versions

Transformers 4.21.3
TensorFlow 2.9.1
Datasets 2.7.0
Tokenizers 0.12.1