Sewy2 (untrained) 640m

It is a new MoE architecture which uses the following:

  • DeepseekV3
  • nGPT
  • ResFormer
  • NeuTRENO (as in resformer)
  • Tanh logit softcapping (as in Gemma2)

Architecture:

  • 32 Layers
  • 32 Heads
  • 32 KV heads
  • 64 experts
  • 8 experts per token
Downloads last month
25
Safetensors
Model size
640M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support