DisOOM
/

Qwen1.5-55B-Chat-Cut

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

DisOOM commited on Mar 13, 2024

Commit

72288c7

·

verified ·

1 Parent(s): 7399796

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
 ---
 **--Qwen1.5-55B-Chat-Cut--**
-**It's a experimental model made by mergekit**
 **I remove 20 layers of qwen1.5-72B-Chat and got a 55B model with 60 layers. It still works well, but with some degradation (compared to the 72B model, it appears slightly more confused and the logic is somewhat more disordered, but there seems to be no discernible loss in writing ability). I also tried removing more layers; when the qwen1.5-72B-Chat model is cut down to 40B+, it becomes noticeably extremely chaotic and foolish but still functions. However, when cut down to 30B+, it completely collapses and only generates meaningless gibberish.**

 ---
 **--Qwen1.5-55B-Chat-Cut--**
+**-It's a experimental model made by mergekit-**
 **I remove 20 layers of qwen1.5-72B-Chat and got a 55B model with 60 layers. It still works well, but with some degradation (compared to the 72B model, it appears slightly more confused and the logic is somewhat more disordered, but there seems to be no discernible loss in writing ability). I also tried removing more layers; when the qwen1.5-72B-Chat model is cut down to 40B+, it becomes noticeably extremely chaotic and foolish but still functions. However, when cut down to 30B+, it completely collapses and only generates meaningless gibberish.**