Two LoRA cold-start SFT experiments teaching structured think/answer reasoning to Nanbeige4-3B-Base using distilled traces from frontier models
Mrinaal Arora
mrinaalarora
AI & ML interests
None yet
Recent Activity
updated a Space about 9 hours ago
mrinaalarora/qwen3-1.7b-grpo-62s-4r-2run published a Space about 9 hours ago
mrinaalarora/qwen3-1.7b-grpo-62s-4r-2run updated a model about 9 hours ago
mrinaalarora/wordle-grpo-Qwen3-1.7BOrganizations
None yet