Model Description

An uncensored reasoning Llama 3.2 3B model trained on reasoning data.

It has been trained using improved training code, and gives an improved performance.

This is a Thea 3B Update 1 model. The new features are:

  • Trained on more examples than the original Thea model.
  • Based off a different base model, with some of the lost accuracy points (hopefully) restored.

This model has not been tested in a GGUF setting yet. Try it in a GGUF setting yourself by using the GGUF My Repo space.

Here is what inference code you should use:

from transformers import AutoModelForCausalLM, AutoTokenizer

MAX_REASONING_TOKENS = 1024
MAX_RESPONSE_TOKENS = 512

model_name = "lunahr/thea-3b-50r-u1"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Which is greater 9.9 or 9.11 ??"
messages = [
    {"role": "user", "content": prompt}
]

# Generate reasoning
reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("REASONING: " + reasoning_output)

# Generate answer
messages.append({"role": "reasoning", "content": reasoning_output})
response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("ANSWER: " + response_output)

Intended Use

This model is intended as an OpenAI o1 replacement for weaker hardware, mimicking o1 in the response formatting.

Limitations

This Llama model was trained faster than Unsloth using custom training code.

Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.

Downloads last month
13
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lunahr/thea-3b-50r-u1

Dataset used to train lunahr/thea-3b-50r-u1

Collection including lunahr/thea-3b-50r-u1