Aclevo/AclevoGPT-Gemma-2b-CoT-reasoning

This is Aclevo's finetune of Google's model Gemma. This finetune adds advanced Chain of Thought reasoning to Gemma, allowing for more thoughtful and accurate responses.

Here are some of the significant improvements comparing the base Gemma model to AclevoGPT-Gemma-2b-CoT-reasoning: Regular gemma: User: reason your way through this problem carefully: Sally, Mark, and James are all in one room and no one else. James is cooking, Mark is coding, and Sally is playing chess. What is the logical problem with Sally's game?

Response:

I need to understand the scenario. There are three people in a room: Sally, Mark, and James.
I need to understand the activities of each person. James is cooking, Mark is coding, and Sally is playing chess.
The logic problem is focused on Sally's game.

Response: There is no logical problem with Sally's game.

Thought:

The question states that everyone is in the same room, which implies there is sufficient space for all of them to play their own games.

Sally can play her game undisturbed in this scenario. It is possible that there are many games going on in the room.

Aclevo/AclevoGPT-Gemma-2b-CoT-reasoning response: User: reason your way through this problem carefully: Sally, Mark, and James are all in one room and no one else. James is cooking, Mark is coding, and Sally is playing chess. What is the logical problem with Sally's game?

To determine the logical problem with Sally's game, I need to analyze the scenario provided and identify any contradictions or inconsistencies that might arise from the given information. The scenario states that Sally, Mark, and James are in one room, which suggests that there is a shared space where they are engaging in their respective activities.

First, I'll focus on Sally's game, which is playing chess. The question asks for the logical problem with Sally's game, implying that there is an unexpected or problematic aspect of her game. To pinpoint the issue, I'll consider the following steps:

Determine the Context: The scenario indicates that Sally is playing chess, and her primary activity is playing chess. This suggests that Sally's primary focus is on her game, which is unusual for a person who is in a room with two others (Mark and James) engaged in different activities.
Analyze the Activities:
- James is Cooking: James is engaged in cooking, which is a physical activity that involves multiple senses and requires concentration. It's possible that James's cooking activity could overlap or compete with Sally's game, leading to potential distractions or logistical challenges.

Here is how to run this model:


import os
import torch
from transformers import AutoModelForCausalLM, GemmaTokenizer, TextIteratorStreamer

torch.cuda.empty_cache()

def main():
    # Load the finetuned, merged model from your saved directory.
    model = AutoModelForCausalLM.from_pretrained(
        "Aclevo/AclevoGPT-Gemma-2b-CoT-reasoning",
        torch_dtype=torch.float16  # adjust if needed
    )

    # Load the Gemma tokenizer from your local folder containing tokenizer.model.
    tokenizer = GemmaTokenizer.from_pretrained(
        "./tokenizer",  # this folder should contain tokenizer.model
        use_fast=False
    )

    # Move the model to GPU if available.
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model.to(device)

    print("Chat session started. Type 'exit' or 'quit' to end the session.")

    while True:
        user_input = input("User: ")
        if user_input.lower() in ["exit", "quit"]:
            break

        # Build a prompt with a system instruction and the user input.
        # This template instructs the model to provide its chain-of-thought under "Thought:" followed by a "Response:".
        prompt = (
            "You are an AI assistant named AclevoGPT. You are capable of advanced reasoning and will always think twice before giving an answer. "
            "When responding, first provide a detailed chain-of-thought under 'Thought:' and then your final answer under 'Response:'.\n"
            f"User: {user_input}\n"
            "Assistant:\nThought: "
        )

        # Encode the prompt.
        input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)

        # Set up a streamer that will yield tokens as they are generated.
        streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

        # Call generate with the streamer. This call returns immediately and the streamer asynchronously yields tokens.
        model.generate(
            input_ids,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            top_k=50,
            pad_token_id=tokenizer.eos_token_id,
            streamer=streamer
        )

        # Print tokens as they are generated.
        for token_text in streamer:
            print(token_text, end="", flush=True)
        print("\n")  # Newline after generation.

if __name__ == "__main__":
    main()

Remember, every response with this model, or really any language model, should be read with caution, If you use this model in your works, please consider citing us!