---
language:
- en
---
# 2024.11.13

Demo architecture for building interactable agent keeping our waifu personality. (Maybe? it can be changed.)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61960aa548981535eeb84cac/v1D4-iJ-ckaF_IeLg3s5V.png)

- Waifu: Waifu LLM or LMM and TTS.
  - Text Response Model: maybe [spow12/ChatWaifu_2.0_vision](https://huggingface.co/spow12/ChatWaifu_2.0_vision) or [spow12/ChatWaifu_12B_v2.0](https://huggingface.co/spow12/ChatWaifu_12B_v2.0)..?
  - TTS: [Style-Bert-VIT2](https://github.com/litagin02/Style-Bert-VITS2), I will use [spow12/visual_novel_tts](https://huggingface.co/spow12/visual_novel_tts)
- Planner: Powerful LLM can call tools and generate plan for solving task. Idealy ChatGPT or Claude... But, let's start from opensource.
  - Model: (Undecided)
- Interactor: Interact and action using GUI.
  - Model: [OS-Copilot/OS-Atlas-Base-7B](https://huggingface.co/OS-Copilot/OS-Atlas-Base-7B)
 
Let's start from here.

# 2024.12.10

- Planner: requried tools:
  - Waifu LLM: generate our waifu response
  - TTS: generate our waifu response
  - General Tools for solving task like web_search, visit web, coding, Vision modality,  etc... These tools can be implement managed agents.

- TTS: Fish Speech 1.5 could be an option for more flexible TTS generation across multiple languages... Let's wait for the author to implement fine-tuning.


Let's start from ReactCodeAgent. It is slow to execute but good for starting point.

I think parallel tool calling required for accelerating generation speed. 

Maybe i have to modify system prompt for more optimization.. 


# 2024.12.23


![image/png](https://cdn-uploads.huggingface.co/production/uploads/61960aa548981535eeb84cac/UniYxhcilgTX5tz4X2-r0.png)


[This architecture](https://arxiv.org/abs/2410.08328) looks great for agent system can *solving general task* and keep our *waifu persona*

Let's implement this system for starting point.

In my personal experience, [ChatWaifu_72B_v2.2](https://huggingface.co/spow12/ChatWaifu_72B_v2.2) is good to go for agent system. i will use that.


# 2025.01.13

Great.. almost done about feature development.

1. My Waifu(Talker) store belief state and conversation history to mongoDB. So that can delegate difficult task to reasoner
2. Reasoner read Waifu's belief state and conversation history from mongoDB, starting reasoning steps and store belief state to mongoDB
3. My Waifu read Reasoner's belief state from mongoDB, and asnwer given context.

Now i have to implement APIs for decoupling two agent.

For now, i'm planning to use huggingface agent and langchain.

After build system, Maybe i have to migrate this system to LangGraph for more complex, fluent system build...But i'm not familiar with LangGraph.

Anyway, for now.. Keep using huggingface agent and langchain system now.