sklearn-docs (scikit-learn)

Nymbo

posted an update 3 days ago

Post

381

I built a general use MCP space ~ Fetch webpages, DuckDuckGo search, Python code execution, Kokoro TTS, Image Gen, Video Gen.

# Tools

1. Fetch webpage
2. Web search via DuckDuckGo (very concise, low excess context)
3. Python code executor
4. Kokoro-82M speech generation
5. Image Generation (use any model from HF Inference Providers)
6. Video Generation (use any model from HF Inference Providers)

The first four tools can be used without any API keys whatsoever. DDG search is free and the code execution and speech gen is done on CPU. Having a HF_READ_TOKEN in the env variables will show all tools. If there isn't a key present, The Image/Video Gen tools are hidden.

Nymbo/Tools

Nymbo

posted an update 11 days ago

Post

695

Anyone using Jan-v1-4B for local MCP-based web search, I highly recommend you try out Intelligent-Internet/II-Search-4B

Very impressed with this lil guy and it deserves more downloads. It's based on the original version of Qwen3-4B but find that it questions reality way less often. Jan-v1 seems to think that everything it sees is synthetic data and constantly gaslights me

ZennyKenny

posted an update 15 days ago

Post

2554

It's just a matter of time before all the data leakage and data scraping associated with building, training, and using AI results in some kind of major scandal.

That's why I think this paper by @spintronic is so important: Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN (2508.06647)

Glad to know that there are already researchers looking to mitigate and address this risk before the s**t hits the fan.

2 replies

·

1024m

authored a paper 19 days ago

Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Paper • 2508.04683 • Published 21 days ago

1024m

authored a paper 21 days ago

DSBC : Data Science task Benchmarking with Context engineering

Paper • 2507.23336 • Published 28 days ago • 1

Tonic

posted an update 25 days ago

Post

3412

🫡 I am the first and only one to like the French Tax Code Dataset

that's it , that's the post

find the dataset here : louisbrulenaudet/code-impots
follow : @louisbrulenaudet

3 replies

·

Tonic

posted an update about 1 month ago

Post

747

👋 Hey there folks,

just submitted my plugin idea to the G-Assist Plugin Hackathon by @nvidia . Check it out, it's a great way to use a local SLA model on a windows machine to easily and locally get things done ! https://github.com/NVIDIA/G-Assist

Tonic

posted an update about 1 month ago

Post

578

🙋🏻‍♂️ Hey there folks ,

Yesterday , Nvidia released a reasoning model that beats o3 on science, math and coding !

Today you can try it out here : Tonic/Nvidia-OpenReasoning

hope you like it !

Tonic

posted an update about 2 months ago

Post

3326

🙋🏻‍♂️ Normalize adding compute & runtime traces to your model cards

2 replies

·

Tonic

posted an update about 2 months ago

Post

508

Who's going to Raise Summit in Paris Tomorrow ?

If you're around , I would love to meet you :-)

Nymbo

posted an update about 2 months ago

Post

2811

Anyone know how to reset Claude web's MCP config? I connected mine when the HF MCP first released with just the default example spaces added. I added lots of other MCP spaces but Claude.ai doesn't update the available tools... "Disconnecting" the HF integration does nothing, deleting it and adding it again does nothing.

Refreshing tools works fine in VS Code because I can manually restart it in mcp.json, but claude.ai has no such option. Anyone got any ideas?

4 replies

·

Tonic

posted an update 3 months ago

Post

688

🙋🏻‍♂️ hey there folks ,

So every bio/med/chem meeting i go to i always the same questions "why are you sharing a gdrive link with me for this?" and "Do you have any plans to publish your model weights and datasets on huggingface?" and finally i got a good answer today which explains everything :

basically there is some kind of government censorship on this (usa, but i'm sure others too) and they are told they are not allowed as it is considered a "dataleak" which is illegal !!!!

this is terrible ! but the good news is that we can do something about it !

so there is this "call for opinions and comments" here from the NIH (usa) , and here we can make our opinion on this topic known : https://osp.od.nih.gov/comment-form-responsibly-developing-and-sharing-generative-artificial-intelligence-tools-using-nih-controlled-access-data/

kindly consider dropping your opinion and thoughts about this censorship of science , and share this post , link or thoughts widely .

Together maybe we can start to share data and model weights appropriately and openly in a good way 🙏🏻🚀

cc. @cyrilzakka

1024m

authored a paper 3 months ago

Uncovering Cultural Representation Disparities in Vision-Language Models

Paper • 2505.14729 • Published May 20 • 1

Tonic

posted an update 3 months ago

Post

2543

🙋🏻‍♂️ Hey there folks ,

Yesterday the world's first "Learn to Vibe Code" application was released .

As vibe coding is the mainstream paradigm , so now the first educational app is there to support it .

You can try it out already :

https://vibe.takara.ai

and of course it's entirely open source, so i already made my issue and feature branch :-) 🚀

Nymbo

posted an update 4 months ago

Post

4091

Haven't seen this posted anywhere - Llama-3.3-8B-Instruct is available on the new Llama API. Is this a new model or did someone mislabel Llama-3.1-8B?

1 reply

·

ZennyKenny

posted an update 4 months ago

Post

948

Community! 💡💡💡

It's the last day to submit your datasets for the Reasoning Datasets Competition: https://www.bespokelabs.ai/blog/reasoning-datasets-competition

Here are my submissions:
- ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset
- ZennyKenny/cosa-benchmark-dataset
- ZennyKenny/tactical-military-reasoning-v.1.0
- ZennyKenny/tron-dataset-v.1.0

Have a look and drop a ❤️ or comment! Check out the entire collection of submissions here: https://huggingface.co/datasets?other=reasoning-datasets-competition

ZennyKenny

posted an update 4 months ago

Post

3149

After hearing the news that Marc Andreessen thinks that the only job that is safe from AI replacement is venture capital: https://gizmodo.com/marc-andreessen-says-one-job-is-mostly-safe-from-ai-venture-capitalist-2000596506 🧠🧠🧠

The Reasoned Capital synthetic dataset suddenly feels much more topical: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset 🔥🔥🔥

Really looking forward to potentially expanding this architecture and seeing how algorithmic clever investment truly is! 💰💰💰

ZennyKenny

posted an update 4 months ago

Post

3378

When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. 🔥🔥🔥

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.

In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset

Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a ❤️ if you think it could be useful or hit the Community section with suggestions / critiques.

3 replies

·

Nymbo

posted an update 4 months ago

Post

2765

PSA for anyone using Nymbo/Nymbo_Theme or Nymbo/Nymbo_Theme_5 in a Gradio space ~

Both of these themes have been updated to fix some of the long-standing inconsistencies ever since the transition to Gradio v5. Textboxes are no longer bright green and in-line code is readable now! Both themes are now visually identical across versions.

If your space is already using one of these themes, you just need to restart your space to get the latest version. No code changes needed.

ZennyKenny

posted an update 4 months ago

Post

1381

The same way the advent of Adobe Illustrator has led to innovation in the way that creative professionals work, I earnestly believe that AI will do the same (contrary to the popular opinion that it represents some regression in the world of creatives).

@natalika and I were speaking about this topic and like most illustrators she has some understandable concerns about the spread of AI in her field. She also told me how much time she spends generating concept art that will never see the light of day in >98% of cases. 💡

To me, that sounded like a perfect opportunity to leverage image diffusion in a way that helps artists spend more time creating cool stuff rather than just malevolently mining their work and using it without credit. Using the Black Forest Labs base model FLUX, Replicate, and about $5 of H100 compute, I post-trained a LoRA adapter on a set of her images associated with one project she's working on and spun up an app with Hugging Face Spaces (and Zero GPU for the win).

I give you, Natalie Diffusion: ZennyKenny/natalie-diffusion

Now, generating concept art in her particular style takes seconds instead of hours and when it's time to put the work into production, a human designer is still invaluable. And building it in the open hopefully inspires other use cases amongst other designers. 🖖

2 replies

·

AI & ML interests

Team members 135

sklearn-docs's activity