--- license: apache-2.0 datasets: - oeg/CelebA_RoBERTa_Sp language: - es tags: - Spanish - CelebA - Roberta-base-bne - celebFaces Attributes pipeline_tag: text-to-image --- # RoBERTa base BNE trained with data from the descriptive text corpus of the CelebA dataset ## Overview - **Language**: Spanish - **Data**: [CelebA_RoBERTa_Sp](https://huggingface.co/datasets/oeg/CelebA_RoBERTa_Sp). - **Architecture**: roberta-base ## Description In order to improve the RoBERTa encoder performance, this model has been trained using the generated corpus ([in this respository](https://huggingface.co/oeg/RoBERTa-CelebA-Sp/)) and following the strategy of using a Siamese network together with the loss function of cosine similarity. The following steps were followed: - Define sentence-transformer and torch libraries for the implementation of the encoder. - Divide the training corpus into two parts, training with 249,999 sentences and validation with 10,000 sentences. - Load training / validation data for the model. Two lists are generated for the storage of the information and, in each of them, the entries are composed of a pair of descriptive sentences and their similarity value. - Implement RoBERTa as a baseline model for transformer training. - Train with a Siamese network in which, for a pair of sentences _A_ and _B_ from the training corpus, the similarities of their embedding - vectors _u_ and _v_ generated using the cosine similarity metric (_CosineSimilarityLoss()_) are evaluated. ## How to use ## Licensing information This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0) ## Citation information **Citing**: If you used RoBERTa+CelebA model in your work, please cite the **[????](???)**: ```bib @article{inffus_TINTO, title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation}, journal = {Information Fusion}, author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro}, volume = {91}, pages = {173-186}, year = {2023}, issn = {1566-2535}, doi = {https://doi.org/10.1016/j.inffus.2022.10.011} } ``` ## Autors - [Eduardo Yauri Lozano](https://github.com/eduar03yauri) - [Manuel Castillo-Cara](https://github.com/manwestc) - [Raúl García-Castro](https://github.com/rgcmme) [*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional) ## Contributors See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs). Universidad Politécnica de Madrid Ontology Engineering Group Universidad Politécnica de Madrid