perSLIMmon-8b-base
persimmon-8b went to the vocab lipo clinic
A slimmed-down version of persimmon-8b-base which removes the ~70,000 unused entries in the model vocabulary and tokenizer (see the safetensors layer overview). Should be slightly faster.
Credit: fine-tune-fuyu (scripts/surgery.py
was adapted for persimmon)
inference
install required pkgs:
pip install -U transformers accelerate bitsandbytes sentencepiece
load in 4bit & run inference:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("pszemraj/perSLIMmon-8b-base")
model = AutoModelForCausalLM.from_pretrained(
"pszemraj/perSLIMmon-8b-base",
load_in_4bit=True, # GPU required
torch_dtype="auto",
device_map="auto",
)
inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to(
model.device
)
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.75,
top_p=0.95,
epsilon_cutoff=1e-5,
repetition_penalty=1.05,
renormalize_logits=True,
do_sample=True,
) # adapt inference params as needed
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
inference is decently fast on a colab T4:
CPU times: user 6.01 s, sys: 138 ms, total: 6.15 s
Wall time: 6.23 s
- Downloads last month
- 742
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.