README.md · DisOOM/Qwen1.5-55B-Chat-Cut at c58459e523017ad569b14ceabb793d957f66e4f2

metadata

license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE

--Qwen1.5-55B-Chat-Cut--

-It's a experimental model made by mergekit-

I remove 20 layers of qwen1.5-72B-Chat and got a 55B model with 60 layers. It still works well, but with some degradation (compared to the 72B model, it appears slightly more confused and the logic is somewhat more disordered, but there seems to be no discernible loss in writing ability). I also tried removing more layers; when the qwen1.5-72B-Chat model is cut down to 40B+, it becomes noticeably extremely chaotic and foolish but still functions. However, when cut down to 30B+, it completely collapses and only generates meaningless gibberish.

I'm not sure if it has to do with the position of the removed layers, maybe using a more reasonable excision method can get a model with less paramenters but still working.

-Merge Configuration

This yaml below:

dtype: float16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 20]
    model: D:\Qwen\Qwen1.5-72B-Chat
- sources:
  - layer_range: [30, 50]
    model: D:\Qwen\Qwen1.5-72B-Chat
- sources:
  - layer_range: [60, 80]
    model: D:\Qwen\Qwen1.5-72B-Chat

-Performance

Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be accurate.

It still operates like a fully functional model, but as I mentioned, it has become somewhat dumber and a bit more chaotic, though overall, it doesn't perform badly. I can't determine the extent of the degradation because I don't have the means to quantify my tests.