Update README.md
#4
by
BarraHome
- opened
README.md
CHANGED
@@ -26,7 +26,7 @@ Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse moda
|
|
26 |
|
27 |
### Key Features
|
28 |
|
29 |
-
* **Omni and Novel Architecture**: We propose Thinker-Talker architecture, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. We
|
30 |
|
31 |
* **Real-Time Voice and Video Chat**: Architecture Designed for fully real-time interactions, supporting chunked input and immediate output.
|
32 |
|
|
|
26 |
|
27 |
### Key Features
|
28 |
|
29 |
+
* **Omni and Novel Architecture**: We propose Thinker-Talker architecture, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. We propose a novel position embedding, named TMRoPE (Time-aligned Multimodal RoPE), to synchronize the timestamps of video inputs with audio.
|
30 |
|
31 |
* **Real-Time Voice and Video Chat**: Architecture Designed for fully real-time interactions, supporting chunked input and immediate output.
|
32 |
|