Fatih C. Akyon's picture

Fatih C. Akyon

fcakyon

AI & ML interests

multi-modal learning, video understanding

Recent Activity

Organizations

Deprem Yapay Zeka's profile picture Yapay Zekรข AraลŸtฤฑrma ฤฐnisiyatifi's profile picture Radiology-ai's profile picture OBSS's profile picture fixit's profile picture Gradio-Blocks-Party's profile picture ultralytics+'s profile picture Video Transformers's profile picture Viddexa AI's profile picture Ultralytics's profile picture

fcakyon's activity

New activity in microsoft/Florence-2-base 12 days ago

confidence score

2
#24 opened 14 days ago by
fcakyon
New activity in openbmb/MiniCPM-Llama3-V-2_5 12 days ago
New activity in WalidBouss/LeGrad 14 days ago
liked a Space 14 days ago
reacted to joaogante's post with ๐Ÿค— 14 days ago
view post
Post
3108
New sampling strategy dropped in ๐Ÿค— transformers -- Min P sampling ๐Ÿ”ฅ

Are you tired of having top_k arbitrarily discarding high-quality continuations? Or top_p forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p flag in generate, fresh from a PR merged today! ๐Ÿฅฌ

Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the min_p flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?
๐Ÿ‘‰ High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation)
๐Ÿ‘‰ No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible)

You should set min_p to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.

Kudos to @kalomaze and @menhguin for creating this technique ๐Ÿ”ฅ Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670)

Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd

Have fun experimenting! ๐Ÿ˜Ž
reacted to joaogante's post with ๐Ÿ”ฅ 14 days ago
view post
Post
2690
Adding a long prompt can help you fight LLM hallucinations. However, if you know exactly how you want your LLM output constrained, there are much better strategies! ๐Ÿ’ช

Did you know you can force your LLM to ALWAYS generate a valid JSON file? Or to follow a well-defined answer template? You can do that and more with the ๐Ÿค— transformers-compatible outlines library.

It doesn't only allow you to master your LLM -- your text generation application will also become faster! ๐Ÿ”ฅ The more constrained your text generation is, the bigger speedups you'll see!

Follow @remi and other outlines folks to stay on top of the constrained generation game ๐Ÿง 
New activity in microsoft/Florence-2-large 14 days ago

add_confidence_score

3
#56 opened 6 months ago by
haipingwu
New activity in Ultralytics/YOLOv5 15 days ago

Enable download stats

1
#2 opened 16 days ago by
merve
New activity in Ultralytics/YOLOv8 15 days ago

Enable download stats

1
#1 opened 16 days ago by
merve
New activity in Ultralytics/YOLO11 15 days ago

Update library name

1
#1 opened 16 days ago by
merve
New activity in thwri/CogFlorence-2 21 days ago
New activity in BAAI/bge-m3 25 days ago

broken link in the model card

1
#99 opened 25 days ago by
fcakyon
reacted to merve's post with ๐Ÿค— 30 days ago
view post
Post
3392
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models ๐Ÿงถ

โœจ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
โœจ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench

The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work โฏ๏ธ

Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled ๐Ÿ“ˆ scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful ๐Ÿ”ฅ
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield

They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models ๐Ÿ”ฅ
ยท