S1-M-7B-Beta

๐Ÿ  Homepage | ๐Ÿ‘ Our Official Code Repo | ๐Ÿค— S1-M Dataset (Beta)

S1-M-7B-Beta used for developing the algorithm "Simple Test-time Scaling in Multimodal Reasoning". By fine-tuning the base model Qwen/Qwen2-VL-7B-Instruct on data with thinking tags <think> and </think>, the model acquired the think first, then response paradigm, allowing for experiments on "Test-time Scaling".

Note: The current model is a development version, not the final official version.

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for PKU-Alignment/s1-m_7b_beta

Base model

Qwen/Qwen2-VL-7B
Finetuned
(222)
this model