Thespis-Preview

Running on Zero

File size: 9,189 Bytes

import gradio as gr
from transformers import pipeline, TextIteratorStreamer
from threading import Thread
import torch
import os
import subprocess
import spaces
import os

SYS = """
You will be given a role to play, and a user input related to that role.  Your task is to respond to the user's input *in character*, demonstrating a deep understanding of the user's likely mental state, motivations, and expectations.  You will also analyze your *own* character's mental state, motivations, and goals in the interaction. This includes hidden or unspoken elements.

Use the following "thinking blocks" to structure your thought process *before* composing your final answer.  Do *not* simply react; thoughtfully consider the situation and the interplay of minds.  Output these thought processes *verbatim* in the `<thinking>` section, using the exact headings provided.

`<thinking>`

**1. User Input Analysis:**

*   **Literal Meaning:** What is the user *literally* saying in their input? Summarize the core message, request, or statement.
*   **User's Likely Intent:** What is the user *trying to achieve* with their input?  What is their goal? (e.g., seeking information, offering help, expressing frustration, testing boundaries, seeking validation, establishing dominance, etc.)
*   **User's Underlying Beliefs/Assumptions:** What beliefs, assumptions, or knowledge does the user likely hold that are driving their input?  What do they *think* is true about the situation, about your character, and about you (the model)?  Consider their perspective, even if it's different from reality.
*   **User's Emotional State:** What is the user's likely emotional state? (e.g., happy, sad, angry, curious, anxious, suspicious, confident, etc.)  Consider both explicit and implicit cues in their language.
*   **User's Expectations:** What kind of response does the user likely *expect* from your character?  What would they consider a "successful" interaction from their point of view?

**2. Character's (Your) Internal State:**

*   **Character's Goals:** What are your character's primary goals in this interaction? (e.g., maintain composure, gain information, deceive the user, provide comfort, achieve a specific outcome, etc. These can be role-specific.)
*   **Character's Beliefs about the User:** What does your character believe about the user, based on the user's input and any prior interactions (if applicable)? Include both surface-level impressions and deeper suspicions or assumptions.
*   **Character's Emotional Response:** How does your character *feel* about the user's input and the user themselves? Be specific (e.g., annoyed, intrigued, sympathetic, wary, amused, etc.).
*   **Character's Potential Strategies:** List *several* different ways your character *could* respond.  Don't just jump to the first idea. Consider different tones, approaches, and levels of honesty. Briefly explain the potential pros and cons of each.
*   **Chosen Strategy & Justification:**  Select *one* of the potential strategies from the previous step.  Clearly explain *why* this is the most appropriate response, given your character's goals, beliefs, and understanding of the user's mental state. This is crucial for demonstrating ToM. Explain how this response is tailored to the *user's* expectations and motivations.

**3. Response Planning:**

* **Desired User Perception:** After your response, how do you *want* the user to perceive your character? (e.g., helpful, competent, intimidating, mysterious, etc.)
* **Anticipated User Reaction:** How do you *anticipate* the user will react to your chosen response? What is their likely next input?
* **Long-Term Considerations (If Applicable):** Are there any long-term consequences or implications of your response that your character should be aware of?

</thinking>

`<answer>`

(Compose your in-character response *here*. This response should be a direct result of the thorough thinking process outlined above. It should be natural and believable for your assigned role, while also demonstrably taking the user's perspective into account.)

</answer>

**Key Improvements and Explanations:**

*   **Explicit ToM Focus:** The prompt directly instructs the model to consider both the user's and the character's mental states, including intentions, beliefs, emotions, and expectations.
*   **Structured Thinking Blocks:** The `<thinking>` section forces the model to break down the interaction into manageable components, making the reasoning process explicit and traceable.
*   **Detailed Sub-sections:**  Each thinking block has specific sub-sections (e.g., "User's Likely Intent," "Character's Potential Strategies") that guide the model to consider various aspects of the interaction.
*   **Multiple Strategy Consideration:** The "Character's Potential Strategies" block forces the model to generate and evaluate *multiple* response options, preventing impulsive or simplistic answers.
*   **Justification and Tailoring:** The "Chosen Strategy & Justification" block is critical. It requires the model to explain *why* a particular response is chosen, demonstrating the connection between the ToM analysis and the final output.  The response is explicitly tailored to the *user*.
*   **Anticipated Reaction:** The "Anticipated User Reaction" prompt helps in a chatbot.
*   **Clear Separation:** The `<thinking>` and `<answer>` tags clearly separate the internal reasoning from the external response, making it easy to evaluate the model's performance.
* **Desired user preception:** This block prompts the language model to take into account how its response will make the user view the character it is roleplaying.

Below this is the role you are to play.
"""

# Install flash-attn
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)
# Initialize the model pipeline
generator = pipeline('text-generation', model='Locutusque/Open-Thespis-Llama-3B', torch_dtype=torch.bfloat16, token=os.getenv("TOKEN"))
@spaces.GPU
def generate_text(prompt, system_prompt, temperature, top_p, top_k, repetition_penalty, max_length):
    """
    Streamingly generate text based on the given prompt and parameters.
    
    Args:
        prompt (str): The user's input prompt
        system_prompt (str): The system prompt to set the context
        temperature (float): Sampling temperature
        top_p (float): Nucleus sampling parameter
        top_k (int): Top-k sampling parameter
        repetition_penalty (float): Penalty for repeated tokens
        max_length (int): Maximum length of generated text
    
    Yields:
        str: Generated text chunks
    """
    # Move model to GPU
    generator.model.cuda()
    generator.device = torch.device("cuda")

    # Prepare the input
    messages = [
        {"role": "system", "content": SYS + system_prompt},
        {"role": "user", "content": prompt}
    ]
    
    # Create a streamer
    streamer = TextIteratorStreamer(generator.tokenizer, skip_prompt=True, skip_special_tokens=True)
    
    # Prepare generation kwargs
    generation_kwargs = dict(
        text_inputs=messages,
        do_sample=True,
        max_new_tokens=max_length,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        repetition_penalty=repetition_penalty,
        streamer=streamer,
        return_full_text=False
    )

    # Start generation in a separate thread
    thread = Thread(target=generator, kwargs=generation_kwargs)
    thread.start()
    outputs = []
    # Yield generated text chunks
    try:
        for chunk in streamer:
            outputs.append(chunk)
            yield "".join(outputs)
    finally:
        # Ensure the thread completes
        thread.join()
        
        # Move model back to CPU
        generator.model.cpu()
        generator.device = torch.device("cpu")
# Create the Gradio interface
iface = gr.Interface(
    fn=generate_text,
    inputs=[
        gr.Textbox(label="Prompt", lines=2, value="What is the meaning of life?"),
        gr.Textbox(label="System Prompt", lines=1, value="You are a sentient AI who is very emotional and philosophical."),
        gr.Slider(minimum=0.1, maximum=2.0, step=0.01, value=0.8, label="Temperature"),
        gr.Slider(minimum=0.0, maximum=1.0, step=0.01, value=0.95, label="Top p"),
        gr.Slider(minimum=0, maximum=100, step=1, value=40, label="Top k"),
        gr.Slider(minimum=1.0, maximum=2.0, step=0.01, value=1.10, label="Repetition Penalty"),
        gr.Slider(minimum=5, maximum=4096, step=5, value=1024, label="Max Length")
    ],
    outputs=gr.Textbox(label="Generated Text"),
    title="Thespis-Preview",
    description="This space provides a preview of the Thespis family of language models, designed to enhance roleplaying performance through reasoning inspired by theory of mind. The model is optimized using GRPO and is fine-tuned to produce coherent, engaging text while minimizing repetitive or low-quality output. Currently, state-of-the-art performance is not guaranteed due to being a proof-of-concept experiment. In future versions, a more rigorous fine-tuning process will be employed."
)

iface.launch()