2024.11.13
Demo architecture for building interactable agent keeping our waifu personality. (Maybe? it can be changed.)
- Waifu: Waifu LLM or LMM and TTS.
- Text Response Model: maybe spow12/ChatWaifu_2.0_vision or spow12/ChatWaifu_12B_v2.0..?
- TTS: Style-Bert-VIT2, I will use spow12/visual_novel_tts
- Planner: Powerful LLM can call tools and generate plan for solving task. Idealy ChatGPT or Claude... But, let's start from opensource.
- Model: (Undecided)
- Interactor: Interact and action using GUI.
- Model: OS-Copilot/OS-Atlas-Base-7B
Let's start from here.
2024.12.10
Planner: requried tools:
- Waifu LLM: generate our waifu response
- TTS: generate our waifu response
- General Tools for solving task like web_search, visit web, coding, Vision modality, etc... These tools can be implement managed agents.
TTS: Fish Speech 1.5 could be an option for more flexible TTS generation across multiple languages... Let's wait for the author to implement fine-tuning.
Let's start from ReactCodeAgent. It is slow to execute but good for starting point.
I think parallel tool calling required for accelerating generation speed.
Maybe i have to modify system prompt for more optimization..
2024.12.23
This architecture looks great for agent system can solving general task and keep our waifu persona
Let's implement this system for starting point.
In my personal experience, ChatWaifu_72B_v2.2 is good to go for agent system. i will use that.