Anyone has code to stream real time chat?
As title. Don't want to use it as upload a record and generate one reply.
have a pick at MInicpm-o code probably youll get +- the answer
please someone upload the MLX comptable version )
you would need to implement a streaming input from the webcam !
you also might be able to use browser use ?
here's my mini cpm -o streaming attempt
in order for this to work for qwen, qwen needs this special function called model.streaming_prefill(
+1 would be great to see a streaming output example. It looks like the code for Qwen2_5OmniModel.generate
will first generate from thinker, and then from talker, so for long generations there would be a significant delay. But the audio chat on chat.qwen.ai does not seem to have latency depending on output length
+1 would be great to see a streaming output example. It looks like the code for
Qwen2_5OmniModel.generate
will first generate from thinker, and then from talker, so for long generations there would be a significant delay. But the audio chat on chat.qwen.ai does not seem to have latency depending on output length
There could be a misunderstanding from you. Cuz it is indeed generate from thinker than talker, but it doesn't mean the generation of thinker has to be complete then it gives talker, consider it as think simply generate a few tokens, then talker begins to talk about them while thinker is thinking about future tokens. The communication between those two is streaming.
Yes sorry I understand that talker can start outputting while thinker is still generating. I just mean the generate function in huggingface does the two generations sequentially, and it would be cool to see demo code for streaming.
so to create the streaming webcam input !
This can be done by doing the snapshot thing ! IE : taking short snapshots of video from the webcam , or taking snapshots of audio from the microphone !
This would not be in the model ! it would need to be implemented like the browser use or computer use plug in !
If i remeber the compuuter use and browser use are taking a series of pictures !
the model is taking a few secongs of audio or video with audio as a input ! the input cannot be live to the model but it can feed a series of vieos or pictures as a batch input !
currently speed is the enemy !
but creating the streaming input plugin or library should not be so hard , as it is basically a chain ! (langchain) and a graph of actions ! to retreive a respponse !
this is not a part of the model ! as models do not handle streams they handle messages !!!! ... So a mini stream would be a batched message ? ... a continuous input , and a continuos output would crash many machines ! as writing to the harddrive and reading as well as feeding the input for a response !
Em thats high cpu usage , high gpu usage , high ram usage and high disk activity ! so a very intensive task ! ( probably can run on clouds ! )
I don't think it can achieve the real-time input and real-time output in your title. The so-called real-time input is always block input. You can only control the frequency of the input block, which should be the same as Tesla's smart driving.