Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Aarushhh
/
SEWY2-640M-untrained
like
0
Text Generation
Transformers
Safetensors
Sewy_v2
conversational
custom_code
License:
cc-by-nc-sa-4.0
Model card
Files
Files and versions
Community
Train
Use this model
Sewy2 (untrained) 640m
It is a new MoE architecture which uses the following:
Architecture:
Sewy2 (untrained) 640m
It is a new MoE architecture which uses the following:
DeepseekV3
nGPT
ResFormer
NeuTRENO (as in resformer)
Tanh logit softcapping (as in Gemma2)
Architecture:
32 Layers
32 Heads
32 KV heads
64 experts
8 experts per token
Downloads last month
69
Safetensors
Model size
640M params
Tensor type
F32
·
Inference Examples
Text Generation
Inference API (serverless) does not yet support model repos that contain custom code.