Page : https://huggingface.co/strangerzonehf Describe the artistic properties by posting sample images or links to similar images in the request discussion. If the adapters you're asking for are truly creative and safe for work, I'll train and upload the LoRA to the Stranger Zone repo!
Ever wanted 45 min with one of AIβs most fascinating minds? Was with @thomwolf at HumanX Vegas. Sharing my notes of his Q&A with the pressβcompletely changed how I think about AIβs future:
1οΈβ£ The next wave of successful AI companies wonβt be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but thatβs rarely the only reason we buy one. We expect it to work well, and thatβs enough. LLMs will be the same."
2οΈβ£ Big players are pivoting: "Closed-source companiesβOpenAI being the firstβhave largely shifted from LLM announcements to product announcements."
3οΈβ£ Open source is changing everything: "DeepSeek was open source AIβs ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for freeβand itβs just as good as the paid ones."
4οΈβ£ Product innovation is being democratized: Take Manus, for exampleβthey built a product on top of Anthropicβs models thatβs "actually better than Anthropicβs own product for now, in terms of agents." This proves that anyone can build great products with existing models.
Weβre entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily availableβjust look at the flurry of daily new releases on Hugging Face.
Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."
Honored to be named among their 12 pioneers and power players in the news industry in the 2025 Tech Trends Report from Future Today Strategy Group.
Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.
Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"
This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!
Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!
### Model Details - **Model Name**: [lettucedect-large-modernbert-en-v1](KRLabsOrg/lettucedect-large-modernbert-en-v1) - **Organization**: [KRLabsOrg](https://huggingface.co/KRLabsOrg) - **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) - **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens - **Task**: Token Classification / Hallucination Detection - **Training Dataset**: [RagTruth](wandb/RAGTruth-processed) - **Language**: English - **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.
LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.
We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at
- What makes llama-index stand-out - How the LlamaHub is used for integrations - Creating QueryEngine components - Using agents and tools - Agentic and multi-agent workflows
The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.
π ftBoost is LIVE β Stop Struggling with Fine-Tuning Data!
Alright folks, if youβre tired of manually crafting fine-tuning datasets, ftBoost is here to do the heavy lifting. One-click, LangChain-Groq-powered data augmentation that scales your training data in OpenAI, Gemini, Mistral, and LLaMA formatsβautomatically.
π₯ Whatβs inside? β Smart Augmentations β Paraphrasing, back translation, synonym swapping & synthetic noise. β No more JSONL headaches β Auto-formats everything for OpenAI, Gemini, Mistral & LLaMA. β Custom tuning β Adjust similarity, diversity, and fluency in real-time. β Upload, generate, download β Thatβs it.
β‘ If youβre fine-tuning LLMs, this will save you hours.
What if AI becomes as ubiquitous as the internet, but runs locally and transparently on our devices?
Fascinating TED talk by @thomwolf on open source AI and its future impact.
Imagine this for AI: instead of black box models running in distant data centers, we get transparent AI that runs locally on our phones and laptops, often without needing internet access. If the original team moves on? No problem - resilience is one of the beauties of open source. Anyone (companies, collectives, or individuals) can adapt and fix these models.
This is a compelling vision of AI's future that solves many of today's concerns around AI transparency and centralized control.
π Introducing "Hugging Face Dataset Spotlight" π
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains - open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models. - facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Is this the best tool to extract clean info from PDFs, handwriting and complex documents yet?
Open source olmOCR just dropped and the results are impressive.
Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.
To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.
Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.