Most similar existing model architecture?

#3
by cckm - opened

I would love to try this model on inference platforms like llama.cpp and MLC. However, these platforms require some custom code for model conversion, so it would be easiest if I could start from the conversion code of an existing model, and then adapt it for MobiLlama. Which model's conversion code would you recommend I start from, and what are the key changes I need to pay attention to?

These are the architectures currently converted by llama.cpp:
https://github.com/ggerganov/llama.cpp/blob/052051d8ae4639a1c3c61e7da3237bcc572469d4/convert-hf-to-gguf.py#L178

and by MLC:
https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_chat/model

Ah, I see it now. The biggest change comes from params savings from sharing the MLP across all layers.

cckm changed discussion status to closed

Sign up or log in to comment