homeros_demo / README.md
papayaga's picture
minor
be67891
metadata
title: HOMER.OS
emoji: 🐳
colorFrom: purple
colorTo: black
sdk: docker
app_port: 7860

HOMER.OS

the operating system for the next age of co-creative storytelling

This demo is exploring the future of interactive storytelling. It puts the user in charge of a how the story is going to develop.

Here is how it works:

  1. The user interacts with the system by means of audio messages
  2. The experience starts with the user inputting the details of the hero, the style and the story they'd like to play.
  3. The system creates the beginning of the story and reads it outloud to the user. The system the asks what the hero should do next.
  4. The user answers via voice message
  5. The system takes the user input into account, generates the next chunk of the story and reads it out to the user
  6. The loop continues until after X messages the system decides to end the story (to prevent from exceeding GPT context window for now)

Tech. Stack for the demo:

  • GPT-4 for story generation
  • Whisper for speech to text
  • Play.ht for voice generation
  • Gradio for interface
  • Gradio Spaces for deployment

Story schema

  • STRING uuid = uuid of this story
  • STRING status = 'not_started' / 'checking_magic_word' / 'defining_metadata' / 'ongoing' / 'finished' ...etc.
  • TEXT world = text description of the world
  • TEXT hero = text description of the hero of the story
  • TEXT plot = high level description of the plot. without chapters or anything like that. we can use this to later break down into chapters and get smarter about story ark management with a second LLM
  • STRING ending = text string representing what kind of ending we want e.g. happy or tragic
  • STRING style = text description of the style of story-telling
  • STRING voice = id of the voice we are using for sounding the story
  • TEXT(JSON) chunks = JSON array of story-chunks. each chunk has {"text", "audio_url"}
  • TEXT(JSON) messages = JSON array of messages in the openAI compatible format {role=system/user/assistant content=message}
  • STRING full_story_audio_url = url of the full rendered audio story (story chunks audio combined)
  • TEXT full_story_text = full story text

Flow

  1. Welcome the user
  2. Ask for the magic word
  3. Check the magic word - if not apologize and tell them how to get it
  4. Once we have the magic word - generate uuid and kickstart story configuration:
    • say "Let me now ask you a few questions about the story you'd like to hear..."
    • ask the user about the world their story should happen in
    • ask the user about the hero and save it
    • ask the user about the plot and save it
    • ask the user if they want the story to end in a happy way or in a sad way (free user input) and save it
    • ask the user about the style and save it
  5. Say "Our story is all set! Let it begin."
  6. Tell the first paragraph / part and then ask at the end "What do you think should the hero do next?"
  7. Process user input, generate the next chunk and repeat
  8. If number of chunks (or total tokens in the story) is approaching the limit - end the story by passing a constructed user message that references the type of ending
  9. Thank the user and say goodby
  10. If the user records more messages - say a fixed message that this story has ended but the user wants another one, they can come again.

Basic ToDo

  • Gradio input/outpus/state setup (with text only)
  • Story object setup, schema, logic
  • Set up flow management
  • Add SQlite DB and save stories
  • GPT-4 story generation in a gradio interface
  • Do the evaluator (if it's time to end)
  • Inerchange text output for play.ht voice generation
  • Expose switch to the user on what's the max lenght of story and whether ask about details or not
  • Interchange text input for whisper
  • Clear input on submit [too tricky with gradio]
  • Dockerfile and deploy (including magic word for access control)

Enhancements

  • toggle between text and audio versions of output, not just input
  • Add option to download the full story as one .mp3 after the end
  • Add option to download full story text after the end
  • Add meta-moderator role to manage story ark better