Dokyoon PRO

leeloolee

Eruly

AI & ML interests

Recent Activity

reacted to m-ric's post with 👍 6 days ago

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳 🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. 👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " 🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. 🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! ⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

reacted to alimotahharynia's post with 🔥 6 days ago

Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at https://huggingface.co/spaces/alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157. I would be happy to receive your feedback.

upvoted a paper 6 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

View all activity

Organizations

leeloolee's activity

reacted to m-ric's post with 👍 6 days ago

Post

2015

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

reacted to alimotahharynia's post with 🔥 6 days ago

Post

1541

Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157.
I would be happy to receive your feedback.

upvoted a paper 6 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 14 days ago • 41

reacted to cutechicken's post with ❤️ 6 days ago

Post

2820

🚀 RAGOndevice: High-Performance Local AI Document Analysis Assistant
💫 Core Value
RAGOndevice is a high-performance AI system running locally without cloud dependency. Using CohereForAI's optimized 7B model, it enables professional-grade document analysis on standard PCs. ✨
🌟 Ondevice AI Advantages
1. 🔋 Efficient Resource Utilization

🎯 Optimized 7B Model: Runs on standard PCs
⚡ Local Processing: Instant response without cloud
💻 Low-Spec Compatible: Performs well on regular GPUs
🔄 Optimized Memory: Ensures stable operation

2. 🛡️ Data Security & Cost Efficiency

🔒 Complete Privacy: No external data transmission
🌐 Offline Operation: No internet required
💰 No Subscription: One-time installation
⚙️ Resource Optimization: Uses existing hardware

🎮 Key Features
1. 📊 Powerful Document Analysis

📁 Multi-Format Support: TXT, CSV, PDF, Parquet
🧠 Intelligent Analysis: Automatic structure recognition
👁️ OCR Support: Advanced PDF text extraction
💬 Real-time Chat: Natural language interaction

2. 🔍 Local RAG System

🎯 Efficient Search: TF-IDF based local search
🧩 Context Understanding: Accurate information retrieval
📚 Wikipedia Integration: Rich background knowledge

🎯 Use Cases

🏢 Enterprise: Secure confidential document processing
🔬 Personal Research: Private data analysis
📚 Education: Personal learning material analysis
💻 Development: Local codebase analysis

⭐ Differentiators

🏃‍♂️ Independent Operation: Zero cloud dependency
⚡ Instant Response: No network latency
🔐 Complete Security: Full data control
💎 Cost Efficiency: No ongoing costs

🔮 Future Plans

🚀 Enhanced model optimization
📚 Local knowledge base expansion
⚡ Hardware optimization
📁 Extended file support

🌟 RAGOndevice democratizes high-performance AI, providing the optimal local AI solution for security-sensitive environments. 🚀

🔥 Power of Local AI: Experience enterprise-grade AI capabilities right on your device!

VIDraft/RAGOndevice

liked a dataset 7 days ago

echo840/OCRBench

Viewer • Updated 7 days ago • 1k • 4.46k • 11

upvoted a paper 8 days ago

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22 • 15

upvoted a collection 8 days ago

Multimodal-SAE

Collection

The collection of the sae that hooked on llava • 4 items • Updated 30 days ago • 4

liked a model 8 days ago

U4R/StructTable-InternVL2-1B

Image-to-Text • Updated 13 days ago • 1.87k • 27

upvoted a collection 9 days ago

GUI agents

Collection

A collection of papers on GUI agents • 3 items • Updated 11 days ago • 5

liked a model 9 days ago

google/Gemma-Embeddings-v1.0

Updated 8 days ago • 404 • 104

reacted to julien-c's post with 🔥 14 days ago

Post

7575

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team