Pedro Cabral

cabralski

AI & ML interests

I love computer science.

Recent Activity

liked a model 5 days ago

microsoft/phi-4

upvoted a paper 22 days ago

ORLM: Training Large Language Models for Optimization Modeling

updated a dataset 22 days ago

cabralski/IndustryOR-PTBR

View all activity

Organizations

None yet

cabralski's activity

liked a model 5 days ago

microsoft/phi-4

Text Generation • Updated 5 days ago • 49.9k • 1.14k

upvoted a paper 22 days ago

ORLM: Training Large Language Models for Optimization Modeling

Paper • 2405.17743 • Published May 28, 2024 • 2

updated a dataset 22 days ago

cabralski/IndustryOR-PTBR

Viewer • Updated 22 days ago • 100 • 29

liked a dataset 22 days ago

CardinalOperations/IndustryOR

Viewer • Updated May 29, 2024 • 100 • 60 • 7

reacted to singhsidhukuldeep's post with 🧠 23 days ago

Post

3634

Exciting breakthrough in AI: @Meta 's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!

The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:

>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.

Three-Component Architecture:
• Lightweight Local Encoder that converts bytes to patch representations
• Powerful Global Latent Transformer that processes patches
• Local Decoder that converts patches back to bytes

>> Technical Advantages
• Matches performance of Llama 3 at 8B parameters while being more efficient
• Superior handling of non-English languages and rare character sequences
• Remarkable 99.9% accuracy on spelling tasks
• Better scaling properties than token-based models

>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.

This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!

2 replies

upvoted a paper 24 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 25 days ago • 339

reacted to julien-c's post with 🔥 about 1 month ago

Post

8275

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team

28 replies