danielpark
commited on
Commit
•
f5137f5
1
Parent(s):
3d23d9e
Update README.md
Browse files
README.md
CHANGED
@@ -11,9 +11,9 @@ tags:
|
|
11 |
|
12 |
# A experts weights of [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)
|
13 |
|
14 |
-
Required Weights for
|
15 |
|
16 |
-
The original model is **[AI21lab's Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)**, which requires an
|
17 |
- **Original Model:** [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)
|
18 |
- **MoE Layer Separation**: Consult [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) written by [@TechxGenusand](https://github.com/TechxGenusand) and use [TechxGenus/Jamba-v0.1-9B](https://huggingface.co/TechxGenus/Jamba-v0.1-9B).
|
19 |
|
|
|
11 |
|
12 |
# A experts weights of [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)
|
13 |
|
14 |
+
Required Weights for follow-up research.
|
15 |
|
16 |
+
The original model is **[AI21lab's Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)**, which requires an **>80GB VRAM**. Unfortunately, this almonst was not available via Google Colab or cloud computing services. Thus, attempts were made to perform **MoE (Mixture of Experts) splitting**, using the following resources as a basis:
|
17 |
- **Original Model:** [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)
|
18 |
- **MoE Layer Separation**: Consult [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) written by [@TechxGenusand](https://github.com/TechxGenusand) and use [TechxGenus/Jamba-v0.1-9B](https://huggingface.co/TechxGenus/Jamba-v0.1-9B).
|
19 |
|