oeg
/

Text-to-Image
Spanish
Spanish
CelebA
Roberta-base-bne
celebFaces Attributes
File size: 3,124 Bytes
bd43d38
 
84f34b3
 
 
 
 
 
 
 
 
03866b1
bd43d38
8ee0b43
6ba56a6
84f34b3
6ba56a6
84f34b3
 
 
 
 
1cf2833
6ba56a6
 
 
 
 
 
 
 
84f34b3
d0c2bce
84f34b3
6ba56a6
d0c2bce
6ba56a6
 
d0c2bce
03866b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ba56a6
d0c2bce
 
 
6ba56a6
d0c2bce
 
 
 
 
 
1cf2833
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: apache-2.0
datasets:
- oeg/CelebA_RoBERTa_Sp
language:
- es
tags:
- Spanish
- CelebA
- Roberta-base-bne
- celebFaces Attributes
pipeline_tag: text-to-image
---
# RoBERTa base BNE trained with data from the descriptive text corpus of the CelebA dataset

## Overview

- **Language**: Spanish
- **Data**: [CelebA_RoBERTa_Sp](https://huggingface.co/datasets/oeg/CelebA_RoBERTa_Sp).
- **Architecture**: roberta-base
  
## Description
In order to improve the RoBERTa encoder performance, this model has been trained using the generated corpus ([in this respository](https://huggingface.co/oeg/RoBERTa-CelebA-Sp/)) 
and following the strategy of using a Siamese network together with the loss function of cosine similarity. The following steps were followed:
- Define sentence-transformer and torch libraries for the implementation of the encoder. 
- Divide the training corpus into two parts, training with 249,999 sentences and validation with 10,000 sentences.
- Load training / validation data for the model. Two lists are generated for the storage of the information and, in each of them,
  the entries are composed of a pair of descriptive sentences and their similarity value.
- Implement RoBERTa as a baseline model for transformer training.
- Train with a Siamese network in which, for a pair of sentences _A_ and _B_ from the training corpus, the similarities of their embedding
- vectors _u_ and _v_ generated using the cosine similarity metric (_CosineSimilarityLoss()_) are evaluated.

## How to use


## Licensing information
This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0)

## Citation information

**Citing**: If you used RoBERTa+CelebA model in your work, please cite the **[????](???)**:

```bib
@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}
```

## Autors
- [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
- [Manuel Castillo-Cara](https://github.com/manwestc)
- [Raúl García-Castro](https://github.com/rgcmme)

[*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional)

## Contributors
See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs).

<kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd> 
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>