--- license: apache-2.0 datasets: - AiresPucrs/sentiment-analysis language: - en metrics: - accuracy library_name: keras --- # English Embedding v.16 (Teeny-Tiny Castle) This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research. ## How to Use ```python import numpy as np import tensorflow as tf from huggingface_hub import hf_hub_download # Download the model hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16", filename="english_embedding_vocabulary_16.keras", local_dir="./", repo_type="model" ) # Download the embedding vocabulary txt file hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16", filename="english_embedding_vocabulary.txt", local_dir="./", repo_type="model" ) model = tf.keras.models.load_model('english_embedding_vocabulary_16.keras') # Compile the model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) with open('english_embedding_vocabulary.txt', encoding='utf-8') as fp: english_embedding_vocabulary = [line.strip() for line in fp] fp.close() embeddings = model.get_layer('embedding').get_weights()[0] words_embeddings = {} # iterating through the elements of list for i, word in enumerate(english_embedding_vocabulary): # here we skip the embedding/token 0 (""), because is just the PAD token. if i == 0: continue words_embeddings[word] = embeddings[i] print("Embeddings Dimensions: ", np.array(list(words_embeddings.values())).shape) print("Vocabulary Size: ", len(words_embeddings.keys())) ```