✨Multiple content modalities (text, images, video thumbnails) ✨Rich user interaction data ( from Xiaohongshu’s 300M+ MAUs, 70%+ search penetration) ✨Comprehensive evaluation metrics ✨Support for RAG system development
🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒
Here's why this is a game-changer for agent-based systems: 🧵👇
1️⃣ Security First 🔐 Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.
2️⃣ Deterministic & Reproducible Runs 📦 By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues!
3️⃣ Resource Control & Limits 🚦 Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.
4️⃣ Safer Code Execution in Production 🏭 Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.
5️⃣ Easy to Integrate 🛠️ With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups!
6️⃣ Perfect for Autonomous AI Agents 🤖 If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.
Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.
Features:
- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices; - Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy; - High-fidelity speech generation using the RAD-TTS++ acoustic model; - Fast vocoding using Vocos; - Synthesizes long sentences effectively; - Supports a sampling rate of 44.1 kHz; - Tested on Linux environments and Windows/WSL; - Python API (requires Python 3.9 or later); - CUDA-enabled for GPU acceleration.
Or check it out in the linked HuggingFace dataset!
What makes this dataset unique, useful, and capable of bridging the Sim2Real gap?
💠 The digital twins are not generated by AI, but instead crafted by 3D artists to be INDISTINGUISHABLE from the physical-world objects. This allows the training from this data to transfer into real-world applicability
💠 The simulation software, called FalconEditor, can easily create thousands of images with varying lighting, posing, occlusions, backgrounds, camera positions, and more. This enables robust model training.
💠 The labels are created along with the data. This not only saves large amounts of time, but also ensures the labels are incredibly accurate and reliable.
If you want to create your own thinking model or do a better MistralThinker, I just uploaded my entire dataset made on Deepseek R1 and the axolotl config. (well I made them public)
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!
I was puzzled by the scope of 🐋DeepSeek🐋 projects, i.e. why they built (then open sourced) so many pieces which are all over their technology stack. Good engineers are minimalists. They build only when they have to.
Then I realized that FP8 should be the main driving force here. So your raw inter-GPU bandwidth is cut in half (H800). But if you compress your data presentation from 16 bits to 8 bits, then the effective throughput of your workload stays unchanged!
The idea is simple but lots of work had to be done. Their v3 technical report will give you a wholistic view (better than reading the code). To summarize, data structure is the foundation to any software. Since FP8 was new and untried, the ecosystem wasn't there. So DeepSeek became the trailblazer. Before cooking your meals, you need to till the land, grow crops, and grind the flour 😅
At post time, watt-ai/watt-tool-70B continues to top the Berkeley Function-Calling Leaderboard, with the 8B version occupying the 4th place. A remarkable achievement for a model of that size!
Exciting New Tool for Knowledge Graph Extraction from Plain Text!
I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.
KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.
The technical approach is fascinating:
1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text 2. It aggregates graphs across sources to reduce redundancy 3. Most importantly, it applies iterative LM-based clustering to refine the raw graph
The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").
The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.
For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.
The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!
made a few improvements on custom grpo trainer: - added sequence similarity reward (seems to work) - improved vllm support (5x inference speed) - adjusted reward scores (this helped with format/accuracy) - can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)
Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:
4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342 Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number
9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors
and its ranked number one model under the 25B parameter size mark.
Now, i said "i think" not "i am sure" because this model used the same metric of evaluation the AraGen developers use (the 3C3H) as a reward model to improve its responses and this sparks the question. Is this something good for users or is it another type of overfitting that we don't want ?
I don't know if this is a good thing or a bad thing but what i know is that you can try it from here: Navid-AI/Yehia-7B-preview
MoD ControlNet Tile Upscaler for SDXL: Upscale Your Images with Ease! 🚀
Meet the MoD ControlNet Tile Upscaler for SDXL, a powerful tool that uses advanced technology to upscale your images without losing quality! Our app is designed to process images in tiles without leaving them blurry or with visible lines between the tiles. The result? Upscaled images with preserved details and smooth, natural transitions—all through a user-friendly interface. ✨
What MoD Upscaler Offers:
🔍 Preserved Details: Unlike traditional upscalers, the MoD ControlNet Tile Upscaler enlarges your images while maintaining clarity and adding details that might otherwise be lost. Your photos gain more definition without sacrificing original quality. 🧩 Advanced Tiling Technology: We use a smart combination of techniques to ensure natural and smooth transitions between tiles. This means your upscaled images remain consistent and high-quality, even at higher resolutions. No more visible lines or imperfections! ⚡ Fast and Efficient: You don’t need a super-powered computer! Our app is optimized to run quickly and smoothly, even on simpler machines. 🎨 Easy-to-Use Interface: You don’t have to be an expert to use the MoD ControlNet Tile Upscaler. The interface is simple, intuitive, and designed so anyone can achieve professional-quality results without hassle. Upscale your images without losing quality and with details preserved. Try the MoD ControlNet Tile Upscaler today! 👍
I was just playing around with Python's MIDI library and Colab's code generation, accidentally cooked up a quick n' dirty audio synthesis template. Have fun!
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:
A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.
I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.
We now have a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones 🔥
Researchers from Beijing and Shanghai just published the first application of a deep research system to academia: their algorithm, given a question, can give you a survey of all papers on the subject.
To make a research survey, you generally follow two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing). Researchers followed the same two steps and automated them.
🎯 For the preparation part, a key part is find all the important references on the given subject. Researchers first cast a wide net of all relevant papers. But then finding the really important ones is like distilling knowledge from a haystack of information. To solve this challenge, they built an “AttributeTree” object that structures key information from citations. Ablating these AttributeTrees significantly decreased structure and synthesis scores, so they were really useful!
📝 For the writing part, key was to get a synthesis that's both short and true. This is not easy to get with LLMs! So they used methods like LLM-based deduplication to shorten the too verbose listings made by LLMs, and RAG to grab original quotes instead of made-up ones.
As a result, their system outperforms previous approaches by far!
As assessed by LLM-judges, the quality score os SurveyX even approaches this of human experts, with 4.59/5 vs 4.75/5 🏆