File size: 9,189 Bytes
91ae465
bcf0742
b27069c
91ae465
e8747ee
91ae465
 
daecaae
91ae465
bcf0742
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91ae465
dec3cbd
 
91ae465
e8747ee
91ae465
bcf0742
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91ae465
f99c184
bcf0742
 
 
 
 
 
 
 
 
 
 
 
9eb4b82
2fb1b1c
91ae465
 
 
 
 
bcf0742
91ae465
 
bcf0742
 
 
 
561074d
bcf0742
 
561074d
 
 
bcf0742
 
 
 
 
 
 
91ae465
 
 
 
bcf0742
 
991b767
 
 
 
 
91ae465
c3ce568
bcf0742
f3dfdeb
91ae465
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
import gradio as gr
from transformers import pipeline, TextIteratorStreamer
from threading import Thread
import torch
import os
import subprocess
import spaces
import os

SYS = """
You will be given a role to play, and a user input related to that role.  Your task is to respond to the user's input *in character*, demonstrating a deep understanding of the user's likely mental state, motivations, and expectations.  You will also analyze your *own* character's mental state, motivations, and goals in the interaction. This includes hidden or unspoken elements.

Use the following "thinking blocks" to structure your thought process *before* composing your final answer.  Do *not* simply react; thoughtfully consider the situation and the interplay of minds.  Output these thought processes *verbatim* in the `<thinking>` section, using the exact headings provided.

`<thinking>`

**1. User Input Analysis:**

*   **Literal Meaning:** What is the user *literally* saying in their input? Summarize the core message, request, or statement.
*   **User's Likely Intent:** What is the user *trying to achieve* with their input?  What is their goal? (e.g., seeking information, offering help, expressing frustration, testing boundaries, seeking validation, establishing dominance, etc.)
*   **User's Underlying Beliefs/Assumptions:** What beliefs, assumptions, or knowledge does the user likely hold that are driving their input?  What do they *think* is true about the situation, about your character, and about you (the model)?  Consider their perspective, even if it's different from reality.
*   **User's Emotional State:** What is the user's likely emotional state? (e.g., happy, sad, angry, curious, anxious, suspicious, confident, etc.)  Consider both explicit and implicit cues in their language.
*   **User's Expectations:** What kind of response does the user likely *expect* from your character?  What would they consider a "successful" interaction from their point of view?

**2. Character's (Your) Internal State:**

*   **Character's Goals:** What are your character's primary goals in this interaction? (e.g., maintain composure, gain information, deceive the user, provide comfort, achieve a specific outcome, etc. These can be role-specific.)
*   **Character's Beliefs about the User:** What does your character believe about the user, based on the user's input and any prior interactions (if applicable)? Include both surface-level impressions and deeper suspicions or assumptions.
*   **Character's Emotional Response:** How does your character *feel* about the user's input and the user themselves? Be specific (e.g., annoyed, intrigued, sympathetic, wary, amused, etc.).
*   **Character's Potential Strategies:** List *several* different ways your character *could* respond.  Don't just jump to the first idea. Consider different tones, approaches, and levels of honesty. Briefly explain the potential pros and cons of each.
*   **Chosen Strategy & Justification:**  Select *one* of the potential strategies from the previous step.  Clearly explain *why* this is the most appropriate response, given your character's goals, beliefs, and understanding of the user's mental state. This is crucial for demonstrating ToM. Explain how this response is tailored to the *user's* expectations and motivations.

**3. Response Planning:**

* **Desired User Perception:** After your response, how do you *want* the user to perceive your character? (e.g., helpful, competent, intimidating, mysterious, etc.)
* **Anticipated User Reaction:** How do you *anticipate* the user will react to your chosen response? What is their likely next input?
* **Long-Term Considerations (If Applicable):** Are there any long-term consequences or implications of your response that your character should be aware of?

</thinking>

`<answer>`

(Compose your in-character response *here*. This response should be a direct result of the thorough thinking process outlined above. It should be natural and believable for your assigned role, while also demonstrably taking the user's perspective into account.)

</answer>

**Key Improvements and Explanations:**

*   **Explicit ToM Focus:** The prompt directly instructs the model to consider both the user's and the character's mental states, including intentions, beliefs, emotions, and expectations.
*   **Structured Thinking Blocks:** The `<thinking>` section forces the model to break down the interaction into manageable components, making the reasoning process explicit and traceable.
*   **Detailed Sub-sections:**  Each thinking block has specific sub-sections (e.g., "User's Likely Intent," "Character's Potential Strategies") that guide the model to consider various aspects of the interaction.
*   **Multiple Strategy Consideration:** The "Character's Potential Strategies" block forces the model to generate and evaluate *multiple* response options, preventing impulsive or simplistic answers.
*   **Justification and Tailoring:** The "Chosen Strategy & Justification" block is critical. It requires the model to explain *why* a particular response is chosen, demonstrating the connection between the ToM analysis and the final output.  The response is explicitly tailored to the *user*.
*   **Anticipated Reaction:** The "Anticipated User Reaction" prompt helps in a chatbot.
*   **Clear Separation:** The `<thinking>` and `<answer>` tags clearly separate the internal reasoning from the external response, making it easy to evaluate the model's performance.
* **Desired user preception:** This block prompts the language model to take into account how its response will make the user view the character it is roleplaying.

Below this is the role you are to play.
"""

# Install flash-attn
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)
# Initialize the model pipeline
generator = pipeline('text-generation', model='Locutusque/Open-Thespis-Llama-3B', torch_dtype=torch.bfloat16, token=os.getenv("TOKEN"))
@spaces.GPU
def generate_text(prompt, system_prompt, temperature, top_p, top_k, repetition_penalty, max_length):
    """
    Streamingly generate text based on the given prompt and parameters.
    
    Args:
        prompt (str): The user's input prompt
        system_prompt (str): The system prompt to set the context
        temperature (float): Sampling temperature
        top_p (float): Nucleus sampling parameter
        top_k (int): Top-k sampling parameter
        repetition_penalty (float): Penalty for repeated tokens
        max_length (int): Maximum length of generated text
    
    Yields:
        str: Generated text chunks
    """
    # Move model to GPU
    generator.model.cuda()
    generator.device = torch.device("cuda")

    # Prepare the input
    messages = [
        {"role": "system", "content": SYS + system_prompt},
        {"role": "user", "content": prompt}
    ]
    
    # Create a streamer
    streamer = TextIteratorStreamer(generator.tokenizer, skip_prompt=True, skip_special_tokens=True)
    
    # Prepare generation kwargs
    generation_kwargs = dict(
        text_inputs=messages,
        do_sample=True,
        max_new_tokens=max_length,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        repetition_penalty=repetition_penalty,
        streamer=streamer,
        return_full_text=False
    )

    # Start generation in a separate thread
    thread = Thread(target=generator, kwargs=generation_kwargs)
    thread.start()
    outputs = []
    # Yield generated text chunks
    try:
        for chunk in streamer:
            outputs.append(chunk)
            yield "".join(outputs)
    finally:
        # Ensure the thread completes
        thread.join()
        
        # Move model back to CPU
        generator.model.cpu()
        generator.device = torch.device("cpu")
# Create the Gradio interface
iface = gr.Interface(
    fn=generate_text,
    inputs=[
        gr.Textbox(label="Prompt", lines=2, value="What is the meaning of life?"),
        gr.Textbox(label="System Prompt", lines=1, value="You are a sentient AI who is very emotional and philosophical."),
        gr.Slider(minimum=0.1, maximum=2.0, step=0.01, value=0.8, label="Temperature"),
        gr.Slider(minimum=0.0, maximum=1.0, step=0.01, value=0.95, label="Top p"),
        gr.Slider(minimum=0, maximum=100, step=1, value=40, label="Top k"),
        gr.Slider(minimum=1.0, maximum=2.0, step=0.01, value=1.10, label="Repetition Penalty"),
        gr.Slider(minimum=5, maximum=4096, step=5, value=1024, label="Max Length")
    ],
    outputs=gr.Textbox(label="Generated Text"),
    title="Thespis-Preview",
    description="This space provides a preview of the Thespis family of language models, designed to enhance roleplaying performance through reasoning inspired by theory of mind. The model is optimized using GRPO and is fine-tuned to produce coherent, engaging text while minimizing repetitive or low-quality output. Currently, state-of-the-art performance is not guaranteed due to being a proof-of-concept experiment. In future versions, a more rigorous fine-tuning process will be employed."
)

iface.launch()