OpenDILab

community

https://opendilab.net

OpenDILab

Activity Feed Request to join this org

AI & ML interests

Decision Intelligence & Reinforcement Learning & Deep Learning

Recent Activity

deepcs233 authored a paper 7 days ago

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

deepcs233 authored a paper 7 days ago

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

deepcs233 authored a paper 7 days ago

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

View all activity

OpenDILabCommunity's activity

deepcs233

authored 4 papers 7 days ago

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Paper • 2312.07488 • Published Dec 12, 2023

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Paper • 2403.16999 • Published Mar 25 • 4

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Paper • 2404.13046 • Published Apr 19 • 1

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Paper • 2412.11279 • Published 10 days ago • 12

deepcs233

authored a paper 9 days ago

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published 13 days ago • 21

lunarflu

posted an update 20 days ago

Post

1492

great blogpost! 🔥@wolfram
https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04

zjowowen

updated a model 21 days ago

OpenDILabCommunity/LunarLanderContinuous-v2-QGPO

Reinforcement Learning • Updated 21 days ago

zjowowen

updated a dataset about 2 months ago

OpenDILabCommunity/rl_unplugged_dm_control_suite

Updated Oct 30 • 43 • 1

xianbao

posted an update 4 months ago

Post

1705

With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"

What does the landscape look like? What are other video generation models? This collection below is all your need.

xianbao/video-generation-models-66c350163c74f60f5c412af6

The above video is generated by @a-r-r-o-w with CogVideoX-5B, taken from a nice lookout for the field!

lunarflu

posted an update 4 months ago

Post

1153

@Blane187 could you please modify the title of your blogpost? content is cool, title could be nicer imo https://huggingface.co/blog/Blane187/wtf-is-rvc

3 replies

lunarflu

posted an update 5 months ago

Post

1880

Cool things this week from @huggingface !

🌎AI math olympiad winner NuminaMath is here!
🤗Announcing New Hugging Face and Keras NLP integration
✨UI overhaul to HF tokens!
🧊 Embed our dataset viewer on any webpage!

https://huggingface.co/blog/winning-aimo-progress-prize
https://huggingface.co/blog/keras-nlp-integration
https://huggingface.co/settings/tokens
https://x.com/julien_c/status/1812099420726456457

Check out the full list on our discord! 👇
https://discord.com/invite/JfAtkvEtRb

xianbao

authored a paper 6 months ago

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Paper • 2406.13923 • Published Jun 20 • 21

lunarflu

posted an update 7 months ago

Post

2314

By popular demand, HF activity tracker v1.0 is here! 📊 let's build it together!🤗

Lots of things to improve, feel free to open PRs in the community tab!

good PR ideas:
- track more types of actions that include date+time
- bigger plot
- track discord activity too 🤯
- link github? ⚡

https://huggingface.co/spaces/huggingface-projects/LevelBot

2 replies

lunarflu

posted an update 7 months ago

Post

1962

Weekly highlights for the HF ecosystem!

🚀 Phi 3
🦅 Falcon VLM
🤗 sentence-transformers v3.0 is here! Train and finetune embedding models with multi-GPU training, bf16 support, loss logging, callbacks and more!
🥳 Gradio launch event 6/6! We're launching 1.0 versions of two new libraries, Python + JS client libraries to programmatically query Gradio apps, and several new features making it easier to use Gradio apps in production!
✨ Tools now available in HuggingChat! Use any AI apps built by the community! 🔥
🧊 ML for 3D Course Unit 3 is here! Covering Gaussian splatting, how it fits in the generative 3D pipeline, and hands-on code to build your own demo!

See the full list here!
https://discord.com/channels/879548962464493619/897387888663232554/1245036889539612764 !

2 replies

lunarflu

posted an update 7 months ago

Post

1934

cooking up something....anyone interested in a daily activity tracker for HF?

12 replies

xianbao

posted an update 7 months ago

Post

1792

Why Apache 2.0 Matters for LLMs 🤔

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! 🚀

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? ⬇️

📚 Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

👩‍💻 Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

🔗 Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

🚫 No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^

1 reply

xianbao

posted an update 7 months ago

Post

1214

DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.

lunarflu

posted an update 7 months ago

Post

1217

https://huggingface.co/unsloth just crossed 1M+ downloads! 🤯

Some of the most popular 👀 :
unsloth/llama-3-8b-bnb-4bit
unsloth/llama-3-8b-Instruct-bnb-4bit
unsloth/mistral-7b-instruct-v0.2-bnb-4bit

xianbao

posted an update 8 months ago

Post

1860

So hard to keep up with pace!!! Lots of new Chinese fine-tunes are being released on HF

So I asked my agent to create a collection
xianbao/llama3-zh-662ba8503bdfe51948a28403

code: https://colab.research.google.com/drive/1ap6fP-VytZE367Nqk26DeQqgQkYaf-cD#scrollTo=eljRbYb4c92M

Would be nice to run then regularly. Any thoughts / suggestions on where to host this cron job?

1 reply

deepcs233

authored a paper 11 months ago

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Paper • 2402.05935 • Published Feb 8 • 15

AI & ML interests

Recent Activity

Team members 16

OpenDILabCommunity's activity