Andrew Reed's picture

Andrew Reed

andrewrreed

AI & ML interests

Applied ML, Practical AI, Inference & Deployment, LLMs, Multi-modal Models, RAG

Recent Activity

updated a Space 9 days ago
Olto/langfuse-dashboard
published a Space 9 days ago
Olto/langfuse-dashboard
published a Space 9 days ago
xsolla/langfuse-shared
View all activity

Articles

Organizations

Hugging Face's profile picture Demo Corp's profile picture Atmos Bank's profile picture Hugging Test Lab's profile picture HuggingFaceM4's profile picture Cloudera Fast Forward Labs's profile picture Code Llama's profile picture Xlscout Ltd's profile picture Olto's profile picture Enterprise Explorers's profile picture Navigate360's profile picture Ryght AI's profile picture Marker Learning's profile picture Sanofi's profile picture Social Post Explorers's profile picture Xsolla's profile picture open/ acc's profile picture Langfuse's profile picture

andrewrreed's activity

posted an update 18 days ago
view post
Post
2644
🚀 Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! 🤗

🔍 Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1⃣ One-click deployment: on Spaces with persistent storage and integrated OAuth
🛠 Simple Prompt Management: Version, edit, and update without redeployment
✅ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
📊 Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product they’re building! 👏 @marcklingen @Clemo @MJannik

🔗 Space: langfuse/langfuse-template-space
🔗 Docs: https://huggingface.co/docs/hub/spaces-sdks-docker-langfuse
  • 1 reply
·
reacted to julien-c's post with 🤗❤️🔥 about 2 months ago
view post
Post
8678
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team
·
posted an update 2 months ago
view post
Post
1007
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! 🚀

✨ I just added a new recipe to the Open-Source AI Cookbook that shows you how to:
1️⃣ Deploy Phoenix on HF Spaces with persistent storage in a few clicks
2️⃣ Configure LLM tracing with the 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗔𝗣𝗜
3️⃣ Observe multi-agent application runs with the CrewAI integration

𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗶𝘀 𝗰𝗿𝘂𝗰𝗶𝗮𝗹 for building robust LLM apps.

Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!

🔗 Cookbook recipe: https://huggingface.co/learn/cookbook/en/phoenix_observability_on_hf_spaces
🔗 Phoenix docs: https://docs.arize.com/phoenix
reacted to m-ric's post with ❤️ 2 months ago
view post
Post
1220
Made a new app to visualize the LLM race ⇒ 𝗡𝗼 𝗘𝘂𝗿𝗼𝗽𝗲𝗮𝗻 𝗰𝗼𝗺𝗽𝗮𝗻𝘆 𝗶𝗻 𝘁𝗵𝗲 𝘁𝗼𝗽 𝟭𝟬 🇪🇺❌

See the app here 👉 m-ric/llm-race-to-the-top

I've adapted an app by @andrewrreed that tracks progress of LLMs ( andrewrreed/closed-vs-open-arena-elo), on the Chatbot Arena leaderboard, to compare companies from different countries.

The outcome is quite sad, as a Frenchman and European.

The top 10 is exclusively US 🇺🇸 and Chinese 🇨🇳 companies (after great Chinese LLM releases recently, like the Qwen2.5 series), with the notable exception of Mistral AI 🇫🇷.

American companies are making fast progress, Chinese ones even faster. Europe is at risk of being left behind. And the EU AI Act hasn't even come into force yet to slow down the EU market. We need to wake up 😬

⚠️ Caution: This Chatbot Arena ELO ranking is not the most accurate, especially at high scores like this, because LLM makers can game it to some extent.
  • 1 reply
·
reacted to jsulz's post with ❤️🔥 2 months ago
view post
Post
2925
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks
reacted to m-ric's post with 🔥 2 months ago
view post
Post
3865
𝗧𝗵𝗲 𝗻𝗲𝘅𝘁 𝗯𝗶𝗴 𝘀𝗼𝗰𝗶𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘁 🦋, 𝗶𝘁'𝘀 𝗛𝘂𝗯 𝗣𝗼𝘀𝘁𝘀! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

⚙️ Computed with the great dataset maxiw/hf-posts
⚙️ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
·
reacted to maxiw's post with ❤️ 2 months ago
view post
Post
4646
I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
❤️: 7048 times
🔥: 5921 times
👍: 4856 times
🚀: 2549 times
🤗: 2065 times
·
reacted to clem's post with 🚀🔥 3 months ago
view post
Post
4450
This is no Woodstock AI but will be fun nonetheless haha. I’ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
·
reacted to melisa's post with 🔥 5 months ago
view post
Post
3023
🔥 Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem 🔥

Paper page: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)

TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI 🤖

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs “margins”, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. 🎉

📊 Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. 🤯

👩‍💻🧑‍💻 Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

🧠 More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill
  • 2 replies
·
reacted to m-ric's post with 🔥 6 months ago
view post
Post
1118
𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗶𝗻𝗮𝗹𝗹𝘆 𝗴𝗲𝘁 𝘁𝗵𝗲𝗶𝗿 𝗖𝗵𝗮𝘁𝗯𝗼𝘁 𝗔𝗿𝗲𝗻𝗮 𝗿𝗮𝗻𝗸𝗶𝗻𝗴 🎖️

Given the impressive benchmarks published my Meta for their Llama-3.1 models, I was curious to see how these models would compare to top proprietary models on Chatbot Arena.

Now we've got the results! LMSys released the ELO derived from thousands of user votes for the new models, and here are the rankings:

💥 405B Model ranks 5th overall, in front of GPT-4-turbo! But behind GPT-4o, Claude-3.5 Sonnet and Gemini-advanced.
👏 70B Model climbs up to 9th rank ! From 1206 ➡️ 1244.
👍 8B Model improves from 1152 ➡️ 1170.

✅ This confirms that Llama-3.1 is a good contender for any task: any of its 3 model size is much cheaper to run than equivalent proprietary models!

For instance, here are the inference prices for the top models;
➤ GPT-4-Turbo inference price from OpenAI: $5/M input tokens, $15/M output tokens
➤ Llama-3.1-405B from HF API (for testing only): 3$/M for input or output tokens (Source linked in the first comment)
➤ Llama-3.1-405B from HF API (for testing only): free ✨

Get a head start on the HF API (resource by @andrewrreed ) 👉 https://huggingface.co/learn/cookbook/enterprise_hub_serverless_inference_api
  • 1 reply
·
reacted to dvilasuero's post with 🤗❤️🚀🔥 8 months ago
view post
Post
8121
Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!
·
reacted to lunarflu's post with ❤️ 8 months ago
view post
Post
1957
cooking up something....anyone interested in a daily activity tracker for HF?
·
reacted to tomaarsen's post with 🚀 9 months ago
view post
Post
2400
NuMind has just released 3 new state-of-the-art GLiNER models for Named Entity Recognition/Information Extraction. These GLiNER models allow you to specify any label that you want, and it'll find spans in the text corresponding to your label. It's been shown to work quite well on unusual domains, e.g. celestial entities in my picture.

There are 3 models released:
- numind/NuNER_Zero:
The primary model, SOTA & can detect really long entities.
- numind/NuNER_Zero-span:
Slightly better performance than NuNER Zero, but can't detect entities longer than 12 tokens.
- numind/NuNER_Zero-4k:
Slightly worse than NuNER Zero, but has a context length of 4k tokens.

Some more details about these models in general:
- They are *really* small, orders of magnitude smaller than LLMs, which don't reach this level of performance.
- Because they're small - they're fast: <1s per sentence on free GPUs.
- They have an MIT license: free commercial usage.

Try out the demo here: https://huggingface.co/spaces/numind/NuZero
Or check out all of the models here: numind/nunerzero-zero-shot-ner-662b59803b9b438ff56e49e2

If there's ever a need for me to extract some information from any text: I'll be using these. Great work @Serega6678 !
  • 3 replies
·