|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- AiresPucrs/sentiment-analysis |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
library_name: keras |
|
--- |
|
# English Embedding v.16 (Teeny-Tiny Castle) |
|
|
|
This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research. |
|
|
|
## How to Use |
|
|
|
```python |
|
import numpy as np |
|
import tensorflow as tf |
|
from huggingface_hub import hf_hub_download |
|
|
|
# Download the model |
|
hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16", |
|
filename="english_embedding_vocabulary_16.keras", |
|
local_dir="./", |
|
repo_type="model" |
|
) |
|
|
|
# Download the embedding vocabulary txt file |
|
hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16", |
|
filename="english_embedding_vocabulary.txt", |
|
local_dir="./", |
|
repo_type="model" |
|
) |
|
|
|
model = tf.keras.models.load_model('english_embedding_vocabulary_16.keras') |
|
|
|
# Compile the model |
|
model.compile(loss='binary_crossentropy', |
|
optimizer='adam', |
|
metrics=['accuracy']) |
|
|
|
with open('english_embedding_vocabulary.txt', encoding='utf-8') as fp: |
|
english_embedding_vocabulary = [line.strip() for line in fp] |
|
fp.close() |
|
|
|
embeddings = model.get_layer('embedding').get_weights()[0] |
|
|
|
words_embeddings = {} |
|
|
|
# iterating through the elements of list |
|
for i, word in enumerate(english_embedding_vocabulary): |
|
# here we skip the embedding/token 0 (""), because is just the PAD token. |
|
if i == 0: |
|
continue |
|
words_embeddings[word] = embeddings[i] |
|
|
|
print("Embeddings Dimensions: ", np.array(list(words_embeddings.values())).shape) |
|
print("Vocabulary Size: ", len(words_embeddings.keys())) |
|
|
|
``` |
|
|