Model Card for Model ID

Model Details

Created using

from transformers import LlamaTokenizer, LlamaForCausalLM

# Specify the model name or path from the Hugging Face Hub
org_in = "BEE-spoke-data/"
org_out = "baseten/"
model_name = "smol_llama-101M-GQA"
embedding = False
embedding_name = "embedding-" if embedding else ""
tokenizer = LlamaTokenizer.from_pretrained(org_in+model_name)
model = LlamaForCausalLM.from_pretrained(org_in+model_name)

prompt = "Hello, this is a test."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

model_llama = model
if embedding:
    model_llama = model_llama.model

# 2. Push to a new Hugging Face repository
# Make sure you have run `huggingface-cli login` beforehand to authenticate
model_llama.push_to_hub(org_out+embedding_name+model_name, token="xxx")
tokenizer.push_to_hub(org_out+embedding_name+model_name, token="xx")
Downloads last month
6
Safetensors
Model size
101M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.