SomosNLP

non-profit

https://somosnlp.org/

SomosNLP_

somosnlp

Activity Feed

AI & ML interests

Democratizar el PLN en español e incentivar su aplicación para generar impacto social 💛

Recent Activity

DrishtiSharma authored a paper about 1 month ago

1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

DrishtiSharma authored a paper about 1 month ago

Maya: An Instruction Finetuned Multilingual Multimodal Model

DrishtiSharma authored a paper about 1 month ago

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

View all activity

somosnlp's activity

lewtun

posted an update 6 days ago

Post

3166

I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime

lewtun

posted an update 13 days ago

Post

2068

This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

1 reply

lewtun

posted an update 26 days ago

Post

6708

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!

2 replies

dvilasuero

authored a paper about 1 month ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 17

dvilasuero

posted an update about 1 month ago

Post

2303

🌐 Announcing Global-MMLU: an improved MMLU Open dataset with evaluation coverage across 42 languages, built with Argilla and the Hugging Face community.

Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior Técnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.

🏷️ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!

Thanks to this annotation process, the open dataset contains two subsets:

1. 🗽 Culturally Agnostic: no specific regional, cultural knowledge is required.
2. ⚖️ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.

Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.

I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.

Dataset: CohereForAI/Global-MMLU

dvilasuero

posted an update about 2 months ago

Post

1133

@Jesse-marqo and the Marqo team are killing it on the Hub: top embedding models and datasets!

Here's how to start using their new evaluation dataset for curation and labelling:

1. Deploy Argilla on Spaces: https://huggingface.co/new-space?template=argilla%2Fargilla-template-space
2. Load Marqo/amazon-products-eval with the UI wizard.
3. Start curating!

mariagrandury

updated a collection about 2 months ago

Corpus: Instructions in Spanish and related languages

Collection

24 items • Updated Nov 16, 2024

haritzpuerto

authored a paper 2 months ago

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

Paper • 2411.00154 • Published Oct 31, 2024

dvilasuero

posted an update 2 months ago

Post

686

Build datasets for AI on the Hugging Face Hub—10x easier than ever!

Today, I'm excited to share our biggest feature since we joined Hugging Face.

Here’s how it works:

1. Pick a dataset—upload your own or choose from 240K open datasets.
2. Paste the Hub dataset ID into Argilla and set up your labeling interface.
3. Share the URL with your team or the whole community!

And the best part? It’s:
- No code – no Python needed
- Integrated – all within the Hub
- Scalable – from solo labeling to 100s of contributors

I am incredibly proud of the team for shipping this after weeks of work and many quick iterations.

Let's make this sentence obsolete: "Everyone wants to do the model work, not the data work."

Read, share, and like the HF blog post:
https://huggingface.co/blog/argilla-ui-hub

dvilasuero

posted an update 3 months ago

Post

990

Big news! You can now build strong ML models without days of human labelling

You simply:
- Define your dataset, including annotation guidelines, labels and fields
- Optionally label some records manually.
- Use an LLM to auto label your data with a human (you? your team?) in the loop!

Get started with this blog post:
https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback

plaguss

posted an update 4 months ago

Post

383

Take a look at distilabel's last blog post to see how to leverage FinePersonas to create AI users for a social network, inspired by SocialAI's app: https://x.com/michaelsayman/status/1835841675584811239

- Link: https://distilabel.argilla.io/dev/sections/pipeline_samples/examples/fine_personas_social_network/
- Sample dataset: plaguss/FinePersonas-SocialAI-test
- FinePersonas: argilla/FinePersonas-v0.1

dvilasuero

posted an update 4 months ago

Post

404

Explore FinePersonas, visually with Argilla and black-forest-labs/FLUX.1-schnell

Excited to share this space where the community can explore a tiny subset of FinePersonas

argilla/finepersonas

Dataset built with distilabel and Free Serveless endpoints

This is just a first step towards more interesting experiments with FinePersonas, for example can we use it to assess biases in text2image models?

If you have ideas I'd love to hear them in the comments!

gabrielmbmb

posted an update 4 months ago

Post

1831

Yesterday @mattshumer released mattshumer/Reflection-Llama-3.1-70B, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie 🐦 in combination with https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct to generate reasoning instructions.
2. We generate a response again using https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think 💭 and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset gabrielmbmb/distilabel-reflection-tuning you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.

daqc

updated a model 5 months ago

somosnlp/kuntur-peru-legal-es-gemma-2b-it-merged

Text Generation • Updated Aug 23, 2024 • 16 • 1

gabrielmbmb

posted an update 5 months ago

Post

2909

distilabel 1.3.0 is out! This release contains many core improvements and new tasks that help us building argilla/magpie-ultra-v0.1!

Distributed pipeline execution with Ray, new Magpie tasks, reward models, components for dataset diversity based on sentence embeddings, Argilla 2.0 compatibility and many more features!

Check the new release in GitHub: https://github.com/argilla-io/distilabel

gabrielmbmb

posted an update 5 months ago

Post

3563

Just dropped magpie-ultra-v0.1! The first open synthetic dataset generated with Llama 3.1 405B. Created with distilabel, it's our most advanced and compute-intensive pipeline to date. We made the GPUs of the cluster go brrrrr 🚀

argilla/magpie-ultra-v0.1

Take it a look and tell us what you think! Probably, the models taking the most out of it are smol models 🤗 We will be improving the dataset in upcoming iterations!