Anyone has code to stream real time chat?

by DrNicefellow - opened 12 days ago

Discussion

DrNicefellow

12 days ago

As title. Don't want to use it as upload a record and generate one reply.

qloopai

12 days ago

have a pick at MInicpm-o code probably youll get +- the answer
please someone upload the MLX comptable version )

LeroyDyer

11 days ago

you would need to implement a streaming input from the webcam !
you also might be able to use browser use ?

fullstack

11 days ago

here's my mini cpm -o streaming attempt

https://gist.github.com/fullstackwebdev/119f2fe2b8e8c8460ba053ff1644eb83#file-talk_to_minicpm-o_voice-py-L406-L452

in order for this to work for qwen, qwen needs this special function called model.streaming_prefill(

matthen

10 days ago

+1 would be great to see a streaming output example. It looks like the code for Qwen2_5OmniModel.generate will first generate from thinker, and then from talker, so for long generations there would be a significant delay. But the audio chat on chat.qwen.ai does not seem to have latency depending on output length

https://github.com/huggingface/transformers/blob/3a1ead0aabed473eafe527915eea8c197d424356/src/transformers/models/qwen2_5_omni/modular_qwen2_5_omni.py#L3525

DrNicefellow

10 days ago

+1 would be great to see a streaming output example. It looks like the code for Qwen2_5OmniModel.generate will first generate from thinker, and then from talker, so for long generations there would be a significant delay. But the audio chat on chat.qwen.ai does not seem to have latency depending on output length

https://github.com/huggingface/transformers/blob/3a1ead0aabed473eafe527915eea8c197d424356/src/transformers/models/qwen2_5_omni/modular_qwen2_5_omni.py#L3525

There could be a misunderstanding from you. Cuz it is indeed generate from thinker than talker, but it doesn't mean the generation of thinker has to be complete then it gives talker, consider it as think simply generate a few tokens, then talker begins to talk about them while thinker is thinking about future tokens. The communication between those two is streaming.

matthen

10 days ago

Yes sorry I understand that talker can start outputting while thinker is still generating. I just mean the generate function in huggingface does the two generations sequentially, and it would be cool to see demo code for streaming.

LeroyDyer

10 days ago

so to create the streaming webcam input !

This can be done by doing the snapshot thing ! IE : taking short snapshots of video from the webcam , or taking snapshots of audio from the microphone !
This would not be in the model ! it would need to be implemented like the browser use or computer use plug in !
If i remeber the compuuter use and browser use are taking a series of pictures !
the model is taking a few secongs of audio or video with audio as a input ! the input cannot be live to the model but it can feed a series of vieos or pictures as a batch input !
currently speed is the enemy !
but creating the streaming input plugin or library should not be so hard , as it is basically a chain ! (langchain) and a graph of actions ! to retreive a respponse !
this is not a part of the model ! as models do not handle streams they handle messages !!!! ... So a mini stream would be a batched message ? ... a continuous input , and a continuos output would crash many machines ! as writing to the harddrive and reading as well as feeding the input for a response !
Em thats high cpu usage , high gpu usage , high ram usage and high disk activity ! so a very intensive task ! ( probably can run on clouds ! )

jixiangpan

9 days ago

I don't think it can achieve the real-time input and real-time output in your title. The so-called real-time input is always block input. You can only control the frequency of the input block, which should be the same as Tesla's smart driving.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment