Yi Cui
onekq
AI & ML interests
Benchmark, Code Generation Model
Recent Activity
posted
an
update
about 4 hours ago
Qwen made good students, DeepSeek made a genius.
This is my summaries of their differentiations. I don't think these two players are coordinated but they both have clear goals. One is to build ecosystem and the other is to push AGI.
And IMO they are both doing really well.
replied to
their
post
about 4 hours ago
The performance of deepseek-r1-distill-qwen-32b is abysmal. I know Qwen instruct (not coder) is quite poor on coding. As such, I have low expectation on other R1 repro works also based on Qwen instruct too. https://huggingface.co/collections/onekq-ai/r1-reproduction-works-67a93f2fb8b21202c9eedf0b
This makes it particularly mysterious what went into QwQ-32B? Why did it work so well? Was it trained from scratch? Anyone has insights about this?
https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard
updated
a model
about 5 hours ago
onekq-ai/OneSQL-v0.1-Qwen-32B-GGUF
Organizations
onekq's activity

posted
an
update
about 4 hours ago

replied to
their
post
about 4 hours ago
Ah I see. Thanks!
Still the blogpost didn't mention what the base model is (if any).

replied to
their
post
about 22 hours ago
Cool! I will check it out.
What I meant by switching is this. Sometimes I'm not satisfied with ChatGPT answer, and realized it needs to think harder. So I switched to o1 and asked again, and most of the times the answer gets better. Then I asked a simple follow-up question which o1 overanalyzed. Then I had to switch back to gpt-4o. I don't actually have the foresight which model fits my question the best. I only know it after I read the answer which is too late.
Now imagine a conversation with a human expert. A human can do such switching remarkably well, hence a cool conversation. This can be actually a metric to read the mileage of an applicant.

posted
an
update
1 day ago
Post
1242
The performance of deepseek-r1-distill-qwen-32b is abysmal. I know Qwen instruct (not coder) is quite poor on coding. As such, I have low expectation on other R1 repro works also based on Qwen instruct too.
onekq-ai/r1-reproduction-works-67a93f2fb8b21202c9eedf0b
This makes it particularly mysterious what went into QwQ-32B? Why did it work so well? Was it trained from scratch? Anyone has insights about this?
onekq-ai/WebApp1K-models-leaderboard
This makes it particularly mysterious what went into QwQ-32B? Why did it work so well? Was it trained from scratch? Anyone has insights about this?
onekq-ai/WebApp1K-models-leaderboard

posted
an
update
3 days ago
Post
711
A bigger and harder pain point for reasoning model is to switch modes.
We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.
ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.
We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.
ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.

posted
an
update
6 days ago
Post
3223
QwQ-32B is amazing!
It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models.
onekq-ai/WebApp1K-models-leaderboard
Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? ๐ค
It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models.
onekq-ai/WebApp1K-models-leaderboard
Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? ๐ค

posted
an
update
7 days ago
Post
544
From my own experience these are the pain points for reasoning model adoption.
(1) expensive and even worse, slow, due to excessive token output. You need to 10x your max output length to avoid clipping the thinking process.
(2) you have to filter thinking tokens to retrieve the final output. For mature workflows, this means broad or deep refactoring.
1p vendors (open-source and proprietary) ease these pain points by manipulating their own models. But the problems are exposed when the reasoning model is hosted by 3p MaaS providers.
(1) expensive and even worse, slow, due to excessive token output. You need to 10x your max output length to avoid clipping the thinking process.
(2) you have to filter thinking tokens to retrieve the final output. For mature workflows, this means broad or deep refactoring.
1p vendors (open-source and proprietary) ease these pain points by manipulating their own models. But the problems are exposed when the reasoning model is hosted by 3p MaaS providers.