Sewy2 (untrained) 640m

It is a new MoE architecture which uses the following:

  • DeepseekV3
  • nGPT
  • ResFormer
  • NeuTRENO (as in resformer)
  • Tanh logit softcapping (as in Gemma2)

Architecture:

  • 32 Layers
  • 32 Heads
  • 32 KV heads
  • 64 experts
  • 8 experts per token
Downloads last month
69
Safetensors
Model size
640M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.