DanielJunior's picture
Duplicate from ulysses-camara/legal-bert-pt-br
50fd69b
---
language: pt
license: mit
tags:
- sentence-transformers
duplicated_from: ulysses-camara/legal-bert-pt-br
---
# LegalBERTPT-br
LegalBERTPT-br is a trained sentence embedding using SimCSE, a contrastive learning framework, coupled with the Portuguese pre-trained language model named [BERTimbau](https://huggingface.co/neuralmind/bert-base-portuguese-cased).
# Corpora
– From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we used the column `Conteudo` with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias.
– From [this site](https://www2.camara.leg.br/transparencia/servicos-ao-cidadao/participacao-popular), we also used 147,008 bills. From these projects, we used the summary field named `txtEmenta` and the project core text named `txtExplicacaoEmenta`.
– From Political Speeches, we used 462,831 texts, specifically, we used the columns: `sumario`, `textodiscurso`, and `indexacao`.
These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.
# Citing and Authors
If you find this model helpful, feel free to cite our publication [Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_8):
```bibtex
@inproceedings{bracis,
author = {Nádia Silva and Marília Silva and Fabíola Pereira and João Tarrega and João Beinotti and Márcio Fonseca and Francisco Andrade and André Carvalho},
title = {Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies},
booktitle = {Anais da X Brazilian Conference on Intelligent Systems},
location = {Online},
year = {2021},
keywords = {},
issn = {0000-0000},
publisher = {SBC},
address = {Porto Alegre, RS, Brasil},
url = {https://sol.sbc.org.br/index.php/bracis/article/view/19061}
}
```