William Suffill

wsuff
ยท

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with โค๏ธ about 4 hours ago
๐Ÿš€ DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets! Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. ๐Ÿ“š ๐Ÿ‘‰ But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks. This is big news: with RL, maybe we could build good agents without the need for huge datasets. UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO). Specifically, the reward function assesses: ๐ŸŽฏ Action type accuracy: Does the predicted action match the ground truth? ๐Ÿ“ Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box? ๐Ÿ“‘ Output format: Does the model clearly articulate both its reasoning and final action? Using just 136 carefully selected mobile tasksโ€”compared to 76,000 tasks for larger models like OS-Atlasโ€”UI-R1 shows significant efficiency and improved performance: ๐Ÿ“ˆ Boosted action prediction accuracy from 76% to 89% on AndroidControl. ๐ŸŒ Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K). ๐Ÿ” Enhanced adaptability and generalization, excelling even in out-of-domain scenarios. The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? ๐Ÿง Read the full paper here ๐Ÿ‘‰ https://huggingface.co/papers/2503.21620
View all activity

Organizations

None yet

wsuff's activity

reacted to MikeDoes's post with ๐Ÿ”ฅ about 4 hours ago
view post
Post
2740
๐Ÿš€ We are quite excited to announce the Ai4Privacy Python library! ๐ŸŽ‰

pip install ai4privacy to anonymize short english text with OpenPII Masking 500k labels

๐Ÿ“Š Day 5/7 of PII Masking 1M announcements complete! โฐ
reacted to m-ric's post with โค๏ธ about 4 hours ago
view post
Post
278
๐Ÿš€ DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. ๐Ÿ“š

๐Ÿ‘‰ But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
๐ŸŽฏ Action type accuracy: Does the predicted action match the ground truth?
๐Ÿ“ Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
๐Ÿ“‘ Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasksโ€”compared to 76,000 tasks for larger models like OS-Atlasโ€”UI-R1 shows significant efficiency and improved performance:
๐Ÿ“ˆ Boosted action prediction accuracy from 76% to 89% on AndroidControl.
๐ŸŒ Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
๐Ÿ” Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? ๐Ÿง

Read the full paper here ๐Ÿ‘‰ UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)
reacted to zamal's post with ๐Ÿ‘ about 4 hours ago
view post
Post
255
DeepGit: Your GitHub Gold Digger! ๐Ÿ’ฐ๐Ÿš€
Hey Hugging Face gang! Meet DeepGitโ€”my open-source sidekick that rips through GitHub to snag repos that fit you. Done with dead-end searches? Me too. Built it with LangGraph and some dope tricks:
Embeddings grab the good stuff (HF magic, baby!)

Re-ranking nails the best picks

Snoops docs, code, and buzz in one slick flow

Drops a clean list of hidden gems ๐Ÿ’Ž

Unearth that sneaky ML lib or Python gemโ€”run python app.py or langgraph dev and boom! Peek it at https://github.com/zamalali/DeepGit. Fork it, tweak it, love itโ€”Dockerโ€™s in, HF vibes are strong. Drop a ๐ŸŒŸ or a crazy ideaโ€”Iโ€™m pumped to jam with you all! ๐Ÿช‚
reacted to giadap's post with ๐Ÿ”ฅ 5 days ago
view post
Post
2234
We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.

I found three fundamental challenges:
- Scope problem: how can you know what you're agreeing to when AI could use your data in different ways?
- Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it.
- Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.

Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.

Available here: https://huggingface.co/blog/giadap/beyond-consent
reacted to MikeDoes's post with ๐Ÿš€ 11 days ago
view post
Post
2684
๐Ÿš€ Ai4Privacy Team is excited to unveil PII-Masking-1M, our most significant release yet! ๐ŸŽ‰

This publication series ๐Ÿ“ฆ includes datasets ๐Ÿ“Š, models ๐Ÿค–, and applications โš™๏ธ to advance PII masking with AI systems ๐Ÿ›ก๏ธ

Starting on Monday with daily posts at 7 PM CET โฐ
reacted to chansung's post with โค๏ธ 11 days ago
view post
Post
2518
Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)
  • 1 reply
ยท
reacted to MikeDoes's post with ๐Ÿš€ 12 days ago
view post
Post
2080
#PII Masking Tech that does not **** around!

We are happy to release the OpenPII English Anonymiser โ€”the most powerful open-source tool for redacting sensitive info from English text.

Fine-tuned Modernbert on 5.7 million+ PII examples, itโ€™s clocking 99%+ accuracy across emails, dates, social numbers, and more!

Why itโ€™s a big deal:
โœ… Top-tier precision: 100% for passport numbers, 99.96% for emails*.
โœ… Totally free: MIT license for personal or commercial use.
โœ… No secrets: Full metrics shared on Hugging Face.

#AI #OpenSource #DataSecurity @huggingface

Day 2 out 7 of PII-Masking-1M Announcements Complete!

*Accuracies reported from the new OpenPII-500k dataset

ai4privacy/llama-ai4privacy-english-anonymiser-openpii
reacted to aifeifei798's post with ๐Ÿ‘ 14 days ago
view post
Post
3612
๐Ÿ˜Š This program is designed to remove emojis from a given text. It uses a regular expression (regex) pattern to match and replace emojis with an empty string, effectively removing them from the text. The pattern includes a range of Unicode characters that correspond to various types of emojis, such as emoticons, symbols, and flags. By using this program, you can clean up text data by removing any emojis that may be present, which can be useful for text processing, analysis, or other applications where emojis are not desired. ๐Ÿ’ป
import re

def remove_emojis(text):
    # Define a broader emoji pattern
    emoji_pattern = re.compile(
        "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001F900-\U0001F9FF"  # supplemental symbols and pictographs
        u"\U0001FA00-\U0001FA6F"  # chess symbols and more emojis
        u"\U0001FA70-\U0001FAFF"  # more symbols and pictographs
        u"\U00002600-\U000026FF"  # miscellaneous symbols
        u"\U00002B50-\U00002B59"  # additional symbols
        u"\U0000200D"             # zero width joiner
        u"\U0000200C"             # zero width non-joiner
        u"\U0000FE0F"             # emoji variation selector
        "]+", flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)
reacted to julien-c's post with ๐Ÿ‘ 19 days ago
view post
Post
2781
Important notice ๐Ÿšจ

For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference โ€“ with more coming soon), we've started enabling Pay as you go (=PAYG)

What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.

You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
ยท
reacted to albertvillanova's post with ๐Ÿ”ฅ 21 days ago
view post
Post
3857
๐Ÿš€ Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. ๐Ÿฆพ๐Ÿ”’

Here's why this is a game-changer for agent-based systems: ๐Ÿงต๐Ÿ‘‡

1๏ธโƒฃ Security First ๐Ÿ”
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2๏ธโƒฃ Deterministic & Reproducible Runs ๐Ÿ“ฆ
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable settingโ€”no more environment mismatches or dependency issues!

3๏ธโƒฃ Resource Control & Limits ๐Ÿšฆ
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents donโ€™t spiral out of control.

4๏ธโƒฃ Safer Code Execution in Production ๐Ÿญ
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5๏ธโƒฃ Easy to Integrate ๐Ÿ› ๏ธ
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backendโ€”no need for complex security setups!

6๏ธโƒฃ Perfect for Autonomous AI Agents ๐Ÿค–
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

โšก Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! ๐Ÿš€๐Ÿ’ก
reacted to albertvillanova's post with ๐Ÿ‘ 24 days ago
view post
Post
3751
๐Ÿš€ New smolagents update: Safer Local Python Execution! ๐Ÿฆพ๐Ÿ

With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. ๐Ÿ”’

Here's why this matters & what you need to know! ๐Ÿงต๐Ÿ‘‡

1๏ธโƒฃ Why is local execution risky? โš ๏ธ
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.

2๏ธโƒฃ New Safety Layer in smolagents ๐Ÿ›ก๏ธ
We now inspect every return value during execution:
โœ… Allowed: Safe built-in types (e.g., numbers, strings, lists)
โ›” Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)

3๏ธโƒฃ Immediate Benefits ๐Ÿ’ก
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities

4๏ธโƒฃ Security Disclaimer โš ๏ธ
๐Ÿšจ Despite these improvements, local Python execution is NEVER 100% safe. ๐Ÿšจ
If you need true isolation, use a remote sandboxed executor like Docker or E2B.

5๏ธโƒฃ The Best Practice: Use Sandboxed Execution ๐Ÿ”
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.

6๏ธโƒฃ Upgrade Now & Stay Safe! ๐Ÿš€
Check out the latest smolagents release and start building safer AI agents today.

๐Ÿ”— https://github.com/huggingface/smolagents

What security measures do you take when running AI-generated code? Letโ€™s discuss! ๐Ÿ‘‡

#AI #smolagents #Python #Security
  • 2 replies
ยท
reacted to onekq's post with ๐Ÿš€ 27 days ago
view post
Post
2522
I was puzzled by the scope of ๐Ÿ‹DeepSeek๐Ÿ‹ projects, i.e. why they built (then open sourced) so many pieces which are all over their technology stack. Good engineers are minimalists. They build only when they have to.

Then I realized that FP8 should be the main driving force here. So your raw inter-GPU bandwidth is cut in half (H800). But if you compress your data presentation from 16 bits to 8 bits, then the effective throughput of your workload stays unchanged!

The idea is simple but lots of work had to be done. Their v3 technical report will give you a wholistic view (better than reading the code). To summarize, data structure is the foundation to any software. Since FP8 was new and untried, the ecosystem wasn't there. So DeepSeek became the trailblazer. Before cooking your meals, you need to till the land, grow crops, and grind the flour ๐Ÿ˜…
reacted to nicolay-r's post with ๐Ÿ‘ 2 months ago
view post
Post
1456
๐Ÿ“ข For those who wish to launch distilled DeepSeek R1 for reasoning with schema, sharing the Google Colab notebook:
๐Ÿ“™ https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_colab.ipynb
This is a wrapper of the Qwen2 transformers ๐Ÿค— provider via bulk-chain framework.
Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
GPU: T4 (15GB) is nearly enough in float32 mode.
๐Ÿš€ To boost the performance you may set bf16 mode (use_bf16=True)
๐ŸŒŸ Powered by bulk-chain: https://github.com/nicolay-r/bulk-chain
reacted to m-ric's post with ๐Ÿ”ฅ 2 months ago
view post
Post
4096
๐—ง๐—ต๐—ฒ ๐—›๐˜‚๐—ฏ ๐˜„๐—ฒ๐—น๐—ฐ๐—ผ๐—บ๐—ฒ๐˜€ ๐—ฒ๐˜…๐˜๐—ฒ๐—ฟ๐—ป๐—ฎ๐—น ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ถ๐—ฑ๐—ฒ๐—ฟ๐˜€!

โœ… Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.

Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)

Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.

๐Ÿ’ธ Also, PRO users get 2$ inference credits per month!

Read more in the announcement ๐Ÿ‘‰ https://huggingface.co/blog/inference-providers
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ‘ 2 months ago
view post
Post
5313
Oof, what a week! ๐Ÿฅต So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal ๐Ÿ’ฌ
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐Ÿ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control ๐Ÿคฏ in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs ๐Ÿ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐Ÿคฏ
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio ๐Ÿ—ฃ๏ธ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation โฏ๏ธ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
ยท