Spaces:
Sleeping
Sleeping
title: HOMER.OS | |
emoji: 🐳 | |
colorFrom: purple | |
colorTo: black | |
sdk: docker | |
app_port: 7860 | |
# HOMER.OS | |
### the operating system for the next age of co-creative storytelling | |
This demo is exploring the future of interactive storytelling. | |
It puts the user in charge of a how the story is going to develop. | |
## Here is how it works: | |
1. The user interacts with the system by means of audio messages | |
2. The experience starts with the user inputting the details of the hero, the style and the story they'd like to play. | |
3. The system creates the beginning of the story and reads it outloud to the user. The system the asks what the hero should do next. | |
4. The user answers via voice message | |
5. The system takes the user input into account, generates the next chunk of the story and reads it out to the user | |
6. The loop continues until after X messages the system decides to end the story (to prevent from exceeding GPT context window for now) | |
## Tech. Stack for the demo: | |
- GPT-4 for story generation | |
- Whisper for speech to text | |
- Play.ht for voice generation | |
- Gradio for interface | |
- Gradio Spaces for deployment | |
## Story schema | |
- STRING `uuid` = uuid of this story | |
- STRING `status` = 'not_started' / 'checking_magic_word' / 'defining_metadata' / 'ongoing' / 'finished' ...etc. | |
- TEXT `world` = text description of the world | |
- TEXT `hero` = text description of the hero of the story | |
- TEXT `plot` = high level description of the plot. without chapters or anything like that. we can use this to later break down into chapters and get smarter about story ark management with a second LLM | |
- STRING `ending` = text string representing what kind of ending we want e.g. happy or tragic | |
- STRING `style` = text description of the style of story-telling | |
- STRING `voice` = id of the voice we are using for sounding the story | |
- TEXT(JSON) `chunks` = JSON array of story-chunks. each chunk has {"text", "audio_url"} | |
- TEXT(JSON) `messages` = JSON array of messages in the openAI compatible format {role=system/user/assistant content=message} | |
- STRING `full_story_audio_url` = url of the full rendered audio story (story chunks audio combined) | |
- TEXT `full_story_text` = full story text | |
## Flow | |
1. Welcome the user | |
2. Ask for the magic word | |
3. Check the magic word - if not apologize and tell them how to get it | |
4. Once we have the magic word - generate uuid and kickstart story configuration: | |
- say "Let me now ask you a few questions about the story you'd like to hear..." | |
- ask the user about the world their story should happen in | |
- ask the user about the hero and save it | |
- ask the user about the plot and save it | |
- ask the user if they want the story to end in a happy way or in a sad way (free user input) and save it | |
- ask the user about the style and save it | |
5. Say "Our story is all set! Let it begin." | |
6. Tell the first paragraph / part and then ask at the end "What do you think should the hero do next?" | |
7. Process user input, generate the next chunk and repeat | |
8. If number of chunks (or total tokens in the story) is approaching the limit - end the story by passing a constructed user message that references the type of ending | |
9. Thank the user and say goodby | |
10. If the user records more messages - say a fixed message that this story has ended but the user wants another one, they can come again. | |
## Basic ToDo | |
- [x] Gradio input/outpus/state setup (with text only) | |
- [x] Story object setup, schema, logic | |
- [x] Set up flow management | |
- [x] Add SQlite DB and save stories | |
- [x] GPT-4 story generation in a gradio interface | |
- [x] Do the evaluator (if it's time to end) | |
- [x] Inerchange text output for play.ht voice generation | |
- [x] Expose switch to the user on what's the max lenght of story and whether ask about details or not | |
- [x] Interchange text input for whisper | |
- [ ] Clear input on submit [too tricky with gradio] | |
- [x] Dockerfile and deploy (including magic word for access control) | |
## Enhancements | |
- [ ] toggle between text and audio versions of output, not just input | |
- [ ] Add option to download the full story as one .mp3 after the end | |
- [ ] Add option to download full story text after the end | |
- [ ] Add meta-moderator role to manage story ark better | |