Wok

Wok

AI & ML interests

๐Ÿฆ†

Recent Activity

liked a Space 1 day ago
jamesliu1217/EasyControl_Ghibli
liked a Space 5 days ago
black-forest-labs/FLUX.1-Redux-dev
liked a Space about 1 month ago
linoyts/scribble-sdxl-flash
View all activity

Organizations

Spaces-explorers's profile picture

Wok's activity

reacted to m-ric's post with ๐Ÿ”ฅ 7 months ago
view post
Post
3398
๐Ÿ”ฅ ๐๐ฐ๐ž๐ง ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐ญ๐ก๐ž๐ข๐ซ ๐Ÿ.๐Ÿ“ ๐Ÿ๐š๐ฆ๐ข๐ฅ๐ฒ ๐จ๐Ÿ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ: ๐๐ž๐ฐ ๐’๐Ž๐“๐€ ๐Ÿ๐จ๐ซ ๐š๐ฅ๐ฅ ๐ฌ๐ข๐ณ๐ž๐ฌ ๐ฎ๐ฉ ๐ญ๐จ ๐Ÿ•๐Ÿ๐!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

๐Š๐ž๐ฒ ๐ข๐ง๐ฌ๐ข๐ ๐ก๐ญ๐ฌ:

๐ŸŒ All models have ๐Ÿญ๐Ÿฎ๐Ÿด๐—ธ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต

๐Ÿ“š Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

๐Ÿ’ช The flagship ๐—ค๐˜„๐—ฒ๐—ป๐Ÿฎ.๐Ÿฑ-๐Ÿณ๐Ÿฎ๐—• ๐—ถ๐˜€ ~๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฒ๐˜๐—ถ๐˜๐—ถ๐˜ƒ๐—ฒ ๐˜„๐—ถ๐˜๐—ต ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿฐ๐Ÿฌ๐Ÿฑ๐—•, ๐—ฎ๐—ป๐—ฑ ๐—ต๐—ฎ๐˜€ ๐—ฎ ๐Ÿฏ-๐Ÿฑ% ๐—บ๐—ฎ๐—ฟ๐—ด๐—ถ๐—ป ๐—ผ๐—ป ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿณ๐Ÿฌ๐—• ๐—ผ๐—ป ๐—บ๐—ผ๐˜€๐˜ ๐—ฏ๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ๐˜€.

๐Ÿ‡ซ๐Ÿ‡ท On top of this, it ๐˜๐—ฎ๐—ธ๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ #๐Ÿญ ๐˜€๐—ฝ๐—ผ๐˜ ๐—ผ๐—ป ๐—บ๐˜‚๐—น๐˜๐—ถ๐—น๐—ถ๐—ป๐—ด๐˜‚๐—ฎ๐—น ๐˜๐—ฎ๐˜€๐—ธ๐˜€ so it might become my standard for French

๐Ÿ’ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

๐Ÿงฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

๐Ÿ“„ Technical report to be released "very soon"

๐Ÿ”“ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

๐Ÿค— All models are available on the HF Hub! โžก๏ธ Qwen/qwen25-66e81a666513e518adb90d9e
  • 2 replies
ยท
reacted to Tonic's post with ๐Ÿ”ฅ 7 months ago
reacted to Tonic's post with ๐Ÿ‘€ 7 months ago
reacted to KingNish's post with ๐Ÿ”ฅ 7 months ago
view post
Post
3916
I am experimenting with Flux and trying to push it to its limits without training (as I am GPU-poor ๐Ÿ˜…).
I found some flaws in the pipelines, which I resolved, and now I am able to generate an approx similar quality image as Flux Schnell 4 steps in just 1 step.
Demo Link:
KingNish/Realtime-FLUX

  • 1 reply
ยท
reacted to fffiloni's post with ๐Ÿ‘€ 7 months ago
view post
Post
19541
๐Ÿ‡ซ๐Ÿ‡ท
Quel impact de lโ€™IA sur les filiรจres du cinรฉma, de lโ€™audiovisuel et du jeu vidรฉo?
Etude prospective ร  destination des professionnels
โ€” CNC & BearingPoint | 09/04/2024

Si lโ€™Intelligence Artificielle (IA) est utilisรฉe de longue date dans les secteurs du cinรฉma, de lโ€™audiovisuel et du jeu vidรฉo, les nouvelles applications de lโ€™IA gรฉnรฉrative bousculent notre vision de ce dont est capable une machine et possรจdent un potentiel de transformation inรฉdit. Elles impressionnent par la qualitรฉ de leurs productions et suscitent par consรฉquent de nombreux dรฉbats, entre attentes et apprรฉhensions.

Le CNC a donc dรฉcider de lancer un nouvel Observatoire de lโ€™IA Afin de mieux comprendre les usages de lโ€™IA et ses impacts rรฉels sur la filiรจre de lโ€™image. Dans le cadre de cet Observatoire, le CNC a souhaitรฉ dresser un premier รฉtat des lieux ร  travers la cartographie des usages actuels ou potentiels de lโ€™IA ร  chaque รฉtape du processus de crรฉation et de diffusion dโ€™une ล“uvre, en identifiant les opportunitรฉs et risques associรฉs, notamment en termes de mรฉtiers et dโ€™emploi. Cette รฉtude CNC / Bearing Point en a prรฉsentรฉ les principaux enseignements le 6 mars, lors de la journรฉe CNC ยซ Crรฉer, produire, diffuser ร  lโ€™heure de lโ€™intelligence artificielle ยป.

Le CNC publie la version augmentรฉe de la cartographie des usages de lโ€™IA dans les filiรจres du cinรฉma, de lโ€™audiovisuel et du jeu vidรฉo.

Lien vers la cartographie complรจte: https://www.cnc.fr/documents/36995/2097582/Cartographie+des+usages+IA_rapport+complet.pdf/96532829-747e-b85e-c74b-af313072cab7?t=1712309387891
ยท
reacted to sayakpaul's post with ๐Ÿ”ฅ 8 months ago
view post
Post
4522
Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday ๐Ÿค—
ยท
reacted to gokaygokay's post with ๐Ÿ‘๐Ÿ”ฅ 9 months ago
reacted to gokaygokay's post with ๐Ÿ‘ 9 months ago
reacted to gokaygokay's post with ๐Ÿ‘ 9 months ago
view post
Post
3021
I've created a Stable Diffusion 3 (SD3) image generation space for convenience. Now you can:

1. Generate SD3 prompts from images
2. Enhance your text prompts (turn 1-2 words into full SD3 prompts)

https://huggingface.co/spaces/gokaygokay/SD3-with-VLM-and-Prompt-Enhancer

These features are based on my custom models:

- VLM captioner for prompt generation:
- gokaygokay/sd3-long-captioner

- Prompt Enhancers for SD3 Models:
- gokaygokay/Lamini-Prompt-Enchance-Long
- gokaygokay/Lamini-Prompt-Enchance

You can now simplify your SD3 workflow with these tools!
reacted to gokaygokay's post with ๐Ÿ‘๐Ÿคฏ๐Ÿ”ฅ 9 months ago
reacted to sanchit-gandhi's post with ๐Ÿ‘€ about 1 year ago
view post
Post
Why does returning timestamps help Whisper reduce hallucinations? ๐Ÿง

Empirically, most practitioners have found that setting return_timestamps=True helps reduce hallucinations, particularly when doing long-form evaluation with Transformersโ€™ โ€œchunkedโ€ algorithm.

But why does this work?..

My interpretation is that forcing the model to predict timestamps is contradictory to hallucinations. Suppose you have the transcription:
The cat sat on the on the on the mat.

Where we have a repeated hallucination for โ€œon theโ€. If we ask the model to predict timestamps, then the โ€œon theโ€ has to contribute to the overall segment-level timing, e.g.:
<|0.00|> The cat sat on the on the on the mat.<|5.02|>

However, itโ€™s impossible to fit 3 copies of โ€œon theโ€ within the time allocation given to the segment, so the probability for this hallucinatory sequence becomes lower, and the model actually predicts the correct transcription with highest probability:
<|0.00|> The cat sat on the mat.<|5.02|>

In this sense, the end timestamp is of the opposite of the initial timestamp constraint they describe in Section 4.5 of the paper Robust Speech Recognition via Large-Scale Weak Supervision (2212.04356) โ†’ it helps the model remove extra words at the end of the sequence (rather than the initial timestamp which helps when the model ignores words at the start), but the overall principle is the same (using timestamps to improve the probability of more realistic sequences).

Leaving it open to you: why do you think timestamps reduces Whisper hallucinations?
ยท
New activity in CompVis/stable-diffusion-safety-checker over 1 year ago

Fix imports

1
#45 opened over 1 year ago by
Wok