AiresPucrs
/

embedding-model-16

Model card Files Files and versions Community

embedding-model-16 / README.md

nicholasKluge's picture

Update README.md

52b6734 verified 24 days ago

|

history blame contribute delete

1.83 kB

	---
	license: apache-2.0
	datasets:
	- AiresPucrs/sentiment-analysis
	language:
	- en
	metrics:
	- accuracy
	library_name: keras
	---
	# English Embedding v.16 (Teeny-Tiny Castle)

	This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research.

	## How to Use

	```python
	import numpy as np
	import tensorflow as tf
	from huggingface_hub import hf_hub_download

	# Download the model
	hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16",
	filename="english_embedding_vocabulary_16.keras",
	local_dir="./",
	repo_type="model"
	)

	# Download the embedding vocabulary txt file
	hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16",
	filename="english_embedding_vocabulary.txt",
	local_dir="./",
	repo_type="model"
	)

	model = tf.keras.models.load_model('english_embedding_vocabulary_16.keras')

	# Compile the model
	model.compile(loss='binary_crossentropy',
	optimizer='adam',
	metrics=['accuracy'])

	with open('english_embedding_vocabulary.txt', encoding='utf-8') as fp:
	english_embedding_vocabulary = [line.strip() for line in fp]
	fp.close()

	embeddings = model.get_layer('embedding').get_weights()[0]

	words_embeddings = {}

	# iterating through the elements of list
	for i, word in enumerate(english_embedding_vocabulary):
	# here we skip the embedding/token 0 (""), because is just the PAD token.
	if i == 0:
	continue
	words_embeddings[word] = embeddings[i]

	print("Embeddings Dimensions: ", np.array(list(words_embeddings.values())).shape)
	print("Vocabulary Size: ", len(words_embeddings.keys()))

	```