The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there: Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B Qwen2.5-Coder: 1.5B, 7B, and 32B on the way Qwen2.5-Math: 1.5B, 7B, and 72B.
And they didn't sleep: the performance is top of the game for each weight category!
ššš² š¢š§š¬š¢š š”šš¬:
š All models have šš®š“šø šš¼šøš²š» š°š¼š»šš²š š š¹š²š»š“ššµ
š Models pre-trained on 18T tokens, even longer than the 15T of Llama-3
š«š· On top of this, it šš®šøš²š ššµš² #š šš½š¼š š¼š» šŗšš¹šš¶š¹š¶š»š“šš®š¹ šš®ššøš so it might become my standard for French
š» Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!
š§® Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."
š Technical report to be released "very soon"
š All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"