Wok (Wok)

🔥 𝐐𝐰𝐞𝐧 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐭𝐡𝐞𝐢𝐫 𝟐.𝟓 𝐟𝐚𝐦𝐢𝐥𝐲 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥𝐬: 𝐍𝐞𝐰 𝐒𝐎𝐓𝐀 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐬𝐢𝐳𝐞𝐬 𝐮𝐩 𝐭𝐨 𝟕𝟐𝐁!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

𝐊𝐞𝐲 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬:

🌐 All models have 𝟭𝟮𝟴𝗸 𝘁𝗼𝗸𝗲𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵

📚 Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

💪 The flagship 𝗤𝘄𝗲𝗻𝟮.𝟱-𝟳𝟮𝗕 𝗶𝘀 ~𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝘁𝗶𝘃𝗲 𝘄𝗶𝘁𝗵 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟰𝟬𝟱𝗕, 𝗮𝗻𝗱 𝗵𝗮𝘀 𝗮 𝟯-𝟱% 𝗺𝗮𝗿𝗴𝗶𝗻 𝗼𝗻 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟳𝟬𝗕 𝗼𝗻 𝗺𝗼𝘀𝘁 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀.

🇫🇷 On top of this, it 𝘁𝗮𝗸𝗲𝘀 𝘁𝗵𝗲 #𝟭 𝘀𝗽𝗼𝘁 𝗼𝗻 𝗺𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝘁𝗮𝘀𝗸𝘀 so it might become my standard for French

💻 Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

🧮 Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

📄 Technical report to be released "very soon"

🔓 All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

🤗 All models are available on the HF Hub! ➡️ Qwen/qwen25-66e81a666513e518adb90d9e

2 replies

·

reacted to Tonic's post with 🔥 7 months ago

Post

2750

🙋🏻‍♂️Hey there folks ,

@ucaslcl released a new OCR model , that's👏🏻👏🏻 fantastic : https://huggingface.co/ucaslcl/GOT-OCR2_0

GPU : Tonic/GOT-OCR
Gradio Demo (Image Edit) : Tonic1/ImageEdit-GOT-OCR

Model : https://huggingface.co/ucaslcl/GOT-OCR2_0
Official demo : https://huggingface.co/spaces/ucaslcl/GOT_online
github : https://github.com/Ucas-HaoranWei/GOT-OCR2.0

4 replies

·

reacted to Tonic's post with 👀 7 months ago

Post

1100

🙋🏻‍♂️Hey there folks,

Nvidia just released a small 4B Nemotron-mini model , and it works surprisingly well !

you can check it out here :

base : nvidia/Minitron-4B-Base
instruct : nvidia/Nemotron-Mini-4B-Instruct
demo : https://huggingface.co/spaces/Tonic/Nemotron-Mini-4B

hoep you like it 🤗🤗

reacted to KingNish's post with 🔥 7 months ago

Post

3916

I am experimenting with Flux and trying to push it to its limits without training (as I am GPU-poor 😅).
I found some flaws in the pipelines, which I resolved, and now I am able to generate an approx similar quality image as Flux Schnell 4 steps in just 1 step.
Demo Link:
KingNish/Realtime-FLUX

1 reply

·

reacted to fffiloni's post with 👀 7 months ago

Post

19542

🇫🇷
Quel impact de l’IA sur les filières du cinéma, de l’audiovisuel et du jeu vidéo?
Etude prospective à destination des professionnels
— CNC & BearingPoint | 09/04/2024

Si l’Intelligence Artificielle (IA) est utilisée de longue date dans les secteurs du cinéma, de l’audiovisuel et du jeu vidéo, les nouvelles applications de l’IA générative bousculent notre vision de ce dont est capable une machine et possèdent un potentiel de transformation inédit. Elles impressionnent par la qualité de leurs productions et suscitent par conséquent de nombreux débats, entre attentes et appréhensions.

Le CNC a donc décider de lancer un nouvel Observatoire de l’IA Afin de mieux comprendre les usages de l’IA et ses impacts réels sur la filière de l’image. Dans le cadre de cet Observatoire, le CNC a souhaité dresser un premier état des lieux à travers la cartographie des usages actuels ou potentiels de l’IA à chaque étape du processus de création et de diffusion d’une œuvre, en identifiant les opportunités et risques associés, notamment en termes de métiers et d’emploi. Cette étude CNC / Bearing Point en a présenté les principaux enseignements le 6 mars, lors de la journée CNC « Créer, produire, diffuser à l’heure de l’intelligence artificielle ».

Le CNC publie la version augmentée de la cartographie des usages de l’IA dans les filières du cinéma, de l’audiovisuel et du jeu vidéo.

Lien vers la cartographie complète: https://www.cnc.fr/documents/36995/2097582/Cartographie+des+usages+IA_rapport+complet.pdf/96532829-747e-b85e-c74b-af313072cab7?t=1712309387891

4 replies

·

reacted to sayakpaul's post with 🔥 8 months ago

Post

4524

Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday 🤗

4 replies

·

reacted to gokaygokay's post with 👍🔥 9 months ago

Post

4684

I've made a creative version of Tile Upscaler

- gokaygokay/TileUpscalerV2

- https://github.com/gokayfem/Tile-Upscaler

- New tiling strategy
- Now it's closer to Clarity Upscaler
- It has more parameters to play and it has more room to fail because of that
- You should try different resolutions, strength and controlnet strength

Original Tile Upscaler
- gokaygokay/Tile-Upscaler

reacted to gokaygokay's post with 👍 9 months ago

Post

4854

InSPyReNet Background Removal

I've built a space for fast background removal.

- gokaygokay/Inspyrenet-Rembg

- https://github.com/plemeri/InSPyReNet

2 replies

·

reacted to gokaygokay's post with 👍 9 months ago

Post

3021

I've created a Stable Diffusion 3 (SD3) image generation space for convenience. Now you can:

1. Generate SD3 prompts from images
2. Enhance your text prompts (turn 1-2 words into full SD3 prompts)

https://huggingface.co/spaces/gokaygokay/SD3-with-VLM-and-Prompt-Enhancer

These features are based on my custom models:

- VLM captioner for prompt generation:
- gokaygokay/sd3-long-captioner

- Prompt Enhancers for SD3 Models:
- gokaygokay/Lamini-Prompt-Enchance-Long
- gokaygokay/Lamini-Prompt-Enchance

You can now simplify your SD3 workflow with these tools!

reacted to gokaygokay's post with 👍🤯🔥 9 months ago

Post

6202

Kolors with VLM support

I've built a space for using Kolors image generation model with captioner models and prompt enhancers.

- Space with VLM and Prompt Enhancer
gokaygokay/KolorsPlusPlus

- Original Space for model
gokaygokay/Kolors

- Captioner VLMs
- gokaygokay/sd3-long-captioner-v2

- microsoft/Florence-2-base

- Prompt Enhancers
- gokaygokay/Lamini-Prompt-Enchance-Long

- gokaygokay/Lamini-Prompt-Enchance

reacted to sanchit-gandhi's post with 👀 about 1 year ago

Post

Why does returning timestamps help Whisper reduce hallucinations? 🧐

Empirically, most practitioners have found that setting return_timestamps=True helps reduce hallucinations, particularly when doing long-form evaluation with Transformers’ “chunked” algorithm.

But why does this work?..

My interpretation is that forcing the model to predict timestamps is contradictory to hallucinations. Suppose you have the transcription:

The cat sat on the on the on the mat.

Where we have a repeated hallucination for “on the”. If we ask the model to predict timestamps, then the “on the” has to contribute to the overall segment-level timing, e.g.:

<|0.00|> The cat sat on the on the on the mat.<|5.02|>

However, it’s impossible to fit 3 copies of “on the” within the time allocation given to the segment, so the probability for this hallucinatory sequence becomes lower, and the model actually predicts the correct transcription with highest probability:

<|0.00|> The cat sat on the mat.<|5.02|>

In this sense, the end timestamp is of the opposite of the initial timestamp constraint they describe in Section 4.5 of the paper Robust Speech Recognition via Large-Scale Weak Supervision (2212.04356) → it helps the model remove extra words at the end of the sequence (rather than the initial timestamp which helps when the model ignores words at the start), but the overall principle is the same (using timestamps to improve the probability of more realistic sequences).

Leaving it open to you: why do you think timestamps reduces Whisper hallucinations?