metadata
license: apache-2.0
K2-Chat: a fully-reproducible large language model outperforming Llama 2 70B using 35% less compute
blurb
![k2 eval table](/LLM360/K2-Chat/resolve/main/k2_chat_eval_table.png)
![k2 big eval table](/LLM360/K2-Chat/resolve/main/k2_chat_table_of_tables.png)
Loading K2-Chat
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("LLM360/K2-Chat")
model = AutoModelForCausalLM.from_pretrained("LLM360/K2-Chat")
prompt = '<|beginofuser|>what is the highest mountain on earth?<|beginofsystem|>'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_tokens = model.generate(input_ids, do_sample=True, max_new_tokens=128)
print("-"*20 + "Output for model" + 20 * '-')
print(tokenizer.batch_decode(gen_tokens)[0])