Model Description
An uncensored reasoning Llama 3.2 3B model trained on reasoning data.
It has been trained using improved training code, and gives an improved performance.
This is a Thea 3B Update 1 model. The new features are:
- Trained on more examples than the original Thea model.
- Based off a different base model, with some of the lost accuracy points (hopefully) restored.
This model has not been tested in a GGUF setting yet. Try it in a GGUF setting yourself by using the GGUF My Repo space.
Here is what inference code you should use:
from transformers import AutoModelForCausalLM, AutoTokenizer
MAX_REASONING_TOKENS = 1024
MAX_RESPONSE_TOKENS = 512
model_name = "lunahr/thea-3b-50r-u1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Which is greater 9.9 or 9.11 ??"
messages = [
{"role": "user", "content": prompt}
]
# Generate reasoning
reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("REASONING: " + reasoning_output)
# Generate answer
messages.append({"role": "reasoning", "content": reasoning_output})
response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("ANSWER: " + response_output)
Intended Use
This model is intended as an OpenAI o1 replacement for weaker hardware, mimicking o1 in the response formatting.
Limitations
There may be a higher chance of getting hallucinations with this model due to its small size.
Some questions may be answered incorrectly.
This model is uncensored, exercise caution when generating sensitive content.
Trained by: Piotr Zalewski
License: llama3.2
Architecture:: llama3.2
Finetuned from model: CreitinGameplays/Llama-3.2-3b-Instruct-uncensored-refinetune
Dataset used: KingNish/reasoning-base-20k
This Llama model was trained faster than Unsloth using custom training code.
Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.
- Downloads last month
- 13
Model tree for lunahr/thea-3b-50r-u1
Base model
chuanli11/Llama-3.2-3B-Instruct-uncensored