metadata

title: HOMER.OS
emoji: 🐳
colorFrom: purple
colorTo: black
sdk: docker
app_port: 7860

HOMER.OS

This demo is exploring the future of interactive storytelling. It puts the user in charge of a how the story is going to develop.

Here is how it works:

The user interacts with the system by means of audio messages
The experience starts with the user inputting the details of the hero, the style and the story they'd like to play.
The system creates the beginning of the story and reads it outloud to the user. The system the asks what the hero should do next.
The user answers via voice message
The system takes the user input into account, generates the next chunk of the story and reads it out to the user
The loop continues until after X messages the system decides to end the story (to prevent from exceeding GPT context window for now)

STRING uuid = uuid of this story
STRING status = 'not_started' / 'checking_magic_word' / 'defining_metadata' / 'ongoing' / 'finished' ...etc.
TEXT world = text description of the world
TEXT hero = text description of the hero of the story
TEXT plot = high level description of the plot. without chapters or anything like that. we can use this to later break down into chapters and get smarter about story ark management with a second LLM
STRING ending = text string representing what kind of ending we want e.g. happy or tragic
STRING style = text description of the style of story-telling
STRING voice = id of the voice we are using for sounding the story
TEXT(JSON) chunks = JSON array of story-chunks. each chunk has {"text", "audio_url"}
TEXT(JSON) messages = JSON array of messages in the openAI compatible format {role=system/user/assistant content=message}
STRING full_story_audio_url = url of the full rendered audio story (story chunks audio combined)
TEXT full_story_text = full story text

Welcome the user
Ask for the magic word
Check the magic word - if not apologize and tell them how to get it
Once we have the magic word - generate uuid and kickstart story configuration:
- say "Let me now ask you a few questions about the story you'd like to hear..."
- ask the user about the world their story should happen in
- ask the user about the hero and save it
- ask the user about the plot and save it
- ask the user if they want the story to end in a happy way or in a sad way (free user input) and save it
- ask the user about the style and save it
Say "Our story is all set! Let it begin."
Tell the first paragraph / part and then ask at the end "What do you think should the hero do next?"
Process user input, generate the next chunk and repeat
If number of chunks (or total tokens in the story) is approaching the limit - end the story by passing a constructed user message that references the type of ending
Thank the user and say goodby
If the user records more messages - say a fixed message that this story has ended but the user wants another one, they can come again.

Gradio input/outpus/state setup (with text only)
Story object setup, schema, logic
Set up flow management
Add SQlite DB and save stories
GPT-4 story generation in a gradio interface
Do the evaluator (if it's time to end)
Inerchange text output for play.ht voice generation
Expose switch to the user on what's the max lenght of story and whether ask about details or not
Interchange text input for whisper
Clear input on submit [too tricky with gradio]
Dockerfile and deploy (including magic word for access control)