Post
430
"the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies", deepseek researchers are so based🔥
They had an “aha moment”, a key takeaway from this is to always try out new ideas from first-principles.
Paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
Code: https://github.com/deepseek-ai/DeepSeek-R1
Weights: deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
They had an “aha moment”, a key takeaway from this is to always try out new ideas from first-principles.
Paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
Code: https://github.com/deepseek-ai/DeepSeek-R1
Weights: deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d