Mendonca

Dihelson

AI & ML interests

None yet

Recent Activity

Organizations

None yet

Dihelson's activity

reacted to Jaward's post with πŸ‘€β€οΈ 6 months ago
view post
Post
1786
Supercool Weekend ReadπŸ€–
Nvidia researchers achieved SOTA LLM compression metrics using pruning and knowledge distillation techniques.

Details on Techniques (Simplified):
They started off with a large pre-trained language model (15B params), then:

1. Estimated the importance of different parts of the model (neurons, attention heads, layers) using activation-based metrics on a small calibration dataset.

2. Pruned (remove) less important parts of the model to reduce its size.

3. Retrained the pruned model using knowledge distillation, where the original large model acts as a teacher for the smaller pruned model.

4. Used a lightweight neural architecture search to find the best configuration for the pruned model.

5. Repeated this process iteratively to create even smaller models.

Cool, giving it a try this weekend 😎
Code: https://github.com/NVlabs/Minitron
Paper: https://arxiv.org/abs/2407.14679
Demo: nvidia/minitron
New activity in TheDrummer/Rocinante-12B-v1.1 6 months ago

Wow

8
#1 opened 6 months ago by
Huegli
New activity in mradermacher/model_requests 6 months ago

Llama3.1 gguf model

4
#215 opened 6 months ago by
ML-master-123
New activity in TheDrummer/Moistral-11B-v3-GGUF 6 months ago

Very impressive!

2
#4 opened 7 months ago by
DanielAdler666