Spaces:
Sleeping
Sleeping
metadata
title: HOMER.OS
emoji: 🐳
colorFrom: purple
colorTo: black
sdk: docker
app_port: 7860
HOMER.OS
the operating system for the next age of co-creative storytelling
This demo is exploring the future of interactive storytelling. It puts the user in charge of a how the story is going to develop.
Here is how it works:
- The user interacts with the system by means of audio messages
- The experience starts with the user inputting the details of the hero, the style and the story they'd like to play.
- The system creates the beginning of the story and reads it outloud to the user. The system the asks what the hero should do next.
- The user answers via voice message
- The system takes the user input into account, generates the next chunk of the story and reads it out to the user
- The loop continues until after X messages the system decides to end the story (to prevent from exceeding GPT context window for now)
Tech. Stack for the demo:
- GPT-4 for story generation
- Whisper for speech to text
- Play.ht for voice generation
- Gradio for interface
- Gradio Spaces for deployment
Story schema
- STRING
uuid
= uuid of this story - STRING
status
= 'not_started' / 'checking_magic_word' / 'defining_metadata' / 'ongoing' / 'finished' ...etc. - TEXT
world
= text description of the world - TEXT
hero
= text description of the hero of the story - TEXT
plot
= high level description of the plot. without chapters or anything like that. we can use this to later break down into chapters and get smarter about story ark management with a second LLM - STRING
ending
= text string representing what kind of ending we want e.g. happy or tragic - STRING
style
= text description of the style of story-telling - STRING
voice
= id of the voice we are using for sounding the story - TEXT(JSON)
chunks
= JSON array of story-chunks. each chunk has {"text", "audio_url"} - TEXT(JSON)
messages
= JSON array of messages in the openAI compatible format {role=system/user/assistant content=message} - STRING
full_story_audio_url
= url of the full rendered audio story (story chunks audio combined) - TEXT
full_story_text
= full story text
Flow
- Welcome the user
- Ask for the magic word
- Check the magic word - if not apologize and tell them how to get it
- Once we have the magic word - generate uuid and kickstart story configuration:
- say "Let me now ask you a few questions about the story you'd like to hear..."
- ask the user about the world their story should happen in
- ask the user about the hero and save it
- ask the user about the plot and save it
- ask the user if they want the story to end in a happy way or in a sad way (free user input) and save it
- ask the user about the style and save it
- Say "Our story is all set! Let it begin."
- Tell the first paragraph / part and then ask at the end "What do you think should the hero do next?"
- Process user input, generate the next chunk and repeat
- If number of chunks (or total tokens in the story) is approaching the limit - end the story by passing a constructed user message that references the type of ending
- Thank the user and say goodby
- If the user records more messages - say a fixed message that this story has ended but the user wants another one, they can come again.
Basic ToDo
- Gradio input/outpus/state setup (with text only)
- Story object setup, schema, logic
- Set up flow management
- Add SQlite DB and save stories
- GPT-4 story generation in a gradio interface
- Do the evaluator (if it's time to end)
- Inerchange text output for play.ht voice generation
- Expose switch to the user on what's the max lenght of story and whether ask about details or not
- Interchange text input for whisper
- Clear input on submit [too tricky with gradio]
- Dockerfile and deploy (including magic word for access control)
Enhancements
- toggle between text and audio versions of output, not just input
- Add option to download the full story as one .mp3 after the end
- Add option to download full story text after the end
- Add meta-moderator role to manage story ark better