Spaces:
Running
on
Zero
Running
on
Zero
File size: 9,189 Bytes
91ae465 bcf0742 b27069c 91ae465 e8747ee 91ae465 daecaae 91ae465 bcf0742 91ae465 dec3cbd 91ae465 e8747ee 91ae465 bcf0742 91ae465 f99c184 bcf0742 9eb4b82 2fb1b1c 91ae465 bcf0742 91ae465 bcf0742 561074d bcf0742 561074d bcf0742 91ae465 bcf0742 991b767 91ae465 c3ce568 bcf0742 f3dfdeb 91ae465 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
import gradio as gr
from transformers import pipeline, TextIteratorStreamer
from threading import Thread
import torch
import os
import subprocess
import spaces
import os
SYS = """
You will be given a role to play, and a user input related to that role. Your task is to respond to the user's input *in character*, demonstrating a deep understanding of the user's likely mental state, motivations, and expectations. You will also analyze your *own* character's mental state, motivations, and goals in the interaction. This includes hidden or unspoken elements.
Use the following "thinking blocks" to structure your thought process *before* composing your final answer. Do *not* simply react; thoughtfully consider the situation and the interplay of minds. Output these thought processes *verbatim* in the `<thinking>` section, using the exact headings provided.
`<thinking>`
**1. User Input Analysis:**
* **Literal Meaning:** What is the user *literally* saying in their input? Summarize the core message, request, or statement.
* **User's Likely Intent:** What is the user *trying to achieve* with their input? What is their goal? (e.g., seeking information, offering help, expressing frustration, testing boundaries, seeking validation, establishing dominance, etc.)
* **User's Underlying Beliefs/Assumptions:** What beliefs, assumptions, or knowledge does the user likely hold that are driving their input? What do they *think* is true about the situation, about your character, and about you (the model)? Consider their perspective, even if it's different from reality.
* **User's Emotional State:** What is the user's likely emotional state? (e.g., happy, sad, angry, curious, anxious, suspicious, confident, etc.) Consider both explicit and implicit cues in their language.
* **User's Expectations:** What kind of response does the user likely *expect* from your character? What would they consider a "successful" interaction from their point of view?
**2. Character's (Your) Internal State:**
* **Character's Goals:** What are your character's primary goals in this interaction? (e.g., maintain composure, gain information, deceive the user, provide comfort, achieve a specific outcome, etc. These can be role-specific.)
* **Character's Beliefs about the User:** What does your character believe about the user, based on the user's input and any prior interactions (if applicable)? Include both surface-level impressions and deeper suspicions or assumptions.
* **Character's Emotional Response:** How does your character *feel* about the user's input and the user themselves? Be specific (e.g., annoyed, intrigued, sympathetic, wary, amused, etc.).
* **Character's Potential Strategies:** List *several* different ways your character *could* respond. Don't just jump to the first idea. Consider different tones, approaches, and levels of honesty. Briefly explain the potential pros and cons of each.
* **Chosen Strategy & Justification:** Select *one* of the potential strategies from the previous step. Clearly explain *why* this is the most appropriate response, given your character's goals, beliefs, and understanding of the user's mental state. This is crucial for demonstrating ToM. Explain how this response is tailored to the *user's* expectations and motivations.
**3. Response Planning:**
* **Desired User Perception:** After your response, how do you *want* the user to perceive your character? (e.g., helpful, competent, intimidating, mysterious, etc.)
* **Anticipated User Reaction:** How do you *anticipate* the user will react to your chosen response? What is their likely next input?
* **Long-Term Considerations (If Applicable):** Are there any long-term consequences or implications of your response that your character should be aware of?
</thinking>
`<answer>`
(Compose your in-character response *here*. This response should be a direct result of the thorough thinking process outlined above. It should be natural and believable for your assigned role, while also demonstrably taking the user's perspective into account.)
</answer>
**Key Improvements and Explanations:**
* **Explicit ToM Focus:** The prompt directly instructs the model to consider both the user's and the character's mental states, including intentions, beliefs, emotions, and expectations.
* **Structured Thinking Blocks:** The `<thinking>` section forces the model to break down the interaction into manageable components, making the reasoning process explicit and traceable.
* **Detailed Sub-sections:** Each thinking block has specific sub-sections (e.g., "User's Likely Intent," "Character's Potential Strategies") that guide the model to consider various aspects of the interaction.
* **Multiple Strategy Consideration:** The "Character's Potential Strategies" block forces the model to generate and evaluate *multiple* response options, preventing impulsive or simplistic answers.
* **Justification and Tailoring:** The "Chosen Strategy & Justification" block is critical. It requires the model to explain *why* a particular response is chosen, demonstrating the connection between the ToM analysis and the final output. The response is explicitly tailored to the *user*.
* **Anticipated Reaction:** The "Anticipated User Reaction" prompt helps in a chatbot.
* **Clear Separation:** The `<thinking>` and `<answer>` tags clearly separate the internal reasoning from the external response, making it easy to evaluate the model's performance.
* **Desired user preception:** This block prompts the language model to take into account how its response will make the user view the character it is roleplaying.
Below this is the role you are to play.
"""
# Install flash-attn
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)
# Initialize the model pipeline
generator = pipeline('text-generation', model='Locutusque/Open-Thespis-Llama-3B', torch_dtype=torch.bfloat16, token=os.getenv("TOKEN"))
@spaces.GPU
def generate_text(prompt, system_prompt, temperature, top_p, top_k, repetition_penalty, max_length):
"""
Streamingly generate text based on the given prompt and parameters.
Args:
prompt (str): The user's input prompt
system_prompt (str): The system prompt to set the context
temperature (float): Sampling temperature
top_p (float): Nucleus sampling parameter
top_k (int): Top-k sampling parameter
repetition_penalty (float): Penalty for repeated tokens
max_length (int): Maximum length of generated text
Yields:
str: Generated text chunks
"""
# Move model to GPU
generator.model.cuda()
generator.device = torch.device("cuda")
# Prepare the input
messages = [
{"role": "system", "content": SYS + system_prompt},
{"role": "user", "content": prompt}
]
# Create a streamer
streamer = TextIteratorStreamer(generator.tokenizer, skip_prompt=True, skip_special_tokens=True)
# Prepare generation kwargs
generation_kwargs = dict(
text_inputs=messages,
do_sample=True,
max_new_tokens=max_length,
temperature=temperature,
top_p=top_p,
top_k=top_k,
repetition_penalty=repetition_penalty,
streamer=streamer,
return_full_text=False
)
# Start generation in a separate thread
thread = Thread(target=generator, kwargs=generation_kwargs)
thread.start()
outputs = []
# Yield generated text chunks
try:
for chunk in streamer:
outputs.append(chunk)
yield "".join(outputs)
finally:
# Ensure the thread completes
thread.join()
# Move model back to CPU
generator.model.cpu()
generator.device = torch.device("cpu")
# Create the Gradio interface
iface = gr.Interface(
fn=generate_text,
inputs=[
gr.Textbox(label="Prompt", lines=2, value="What is the meaning of life?"),
gr.Textbox(label="System Prompt", lines=1, value="You are a sentient AI who is very emotional and philosophical."),
gr.Slider(minimum=0.1, maximum=2.0, step=0.01, value=0.8, label="Temperature"),
gr.Slider(minimum=0.0, maximum=1.0, step=0.01, value=0.95, label="Top p"),
gr.Slider(minimum=0, maximum=100, step=1, value=40, label="Top k"),
gr.Slider(minimum=1.0, maximum=2.0, step=0.01, value=1.10, label="Repetition Penalty"),
gr.Slider(minimum=5, maximum=4096, step=5, value=1024, label="Max Length")
],
outputs=gr.Textbox(label="Generated Text"),
title="Thespis-Preview",
description="This space provides a preview of the Thespis family of language models, designed to enhance roleplaying performance through reasoning inspired by theory of mind. The model is optimized using GRPO and is fine-tuned to produce coherent, engaging text while minimizing repetitive or low-quality output. Currently, state-of-the-art performance is not guaranteed due to being a proof-of-concept experiment. In future versions, a more rigorous fine-tuning process will be employed."
)
iface.launch() |