Mamba2 Distilled Model

Version: 1.0
Architecture: MOHAWK LMHead

Overview

This model is a distilled version of SmolLM2-1.7B using the MOHAWK method to SSM based Mamba2 architecture, keeping the MLP layers as is and replacing the attention layers with Mamba2 layers. It was developed for On Pruning State-Space LLMs.

Evaluation

The model has been benchmarked on several tasks:

Task	Metric	Value	Stderr
ARC Challenge	acc	0.4164	±0.0144
ARC Easy	acc	0.7492	±0.0089
Hellaswag	acc	0.4988	±0.0050
Lambada (OpenAI)	acc	0.5707	±0.0069
	perplexity	7.0794	±0.1761
PIQA	acc	0.7661	±0.0099
Winogrande	acc	0.6283	±0.0136

Note:

For accuracy metrics, higher values are better.

For perplexity, lower values are better.

Intended Use

General NLP Tasks: Suitable for various language understanding and reasoning tasks.
Research & Prototyping: Ideal for lightweight experiments and efficient production environments.

Citation

If you use this model, please cite:

@misc{ghattas2025pruningstatespacellms,
  title={On Pruning State-Space LLMs}, 
  author={Tamer Ghattas and Michael Hassid and Roy Schwartz},
  year={2025},
  eprint={2502.18886},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.18886}, 
}

Model Card Last Updated: February 16, 2025

tGhattas
/

Smol2-Mamba-1.9B

Mamba2 Distilled Model

Overview

This model is a distilled version of SmolLM2-1.7B using the MOHAWK method to SSM based Mamba2 architecture, keeping the MLP layers as is and replacing the attention layers with Mamba2 layers. It was developed for On Pruning State-Space LLMs.

Evaluation

Intended Use

Citation

Model tree for tGhattas/Smol2-Mamba-1.9B