Mamba2 Distilled Model

Version: 1.0
Architecture: MOHAWK LMHead


Overview

This model is a distilled version of SmolLM2-1.7B using the MOHAWK method to SSM based Mamba2 architecture, keeping the MLP layers as is and replacing the attention layers with Mamba2 layers. It was developed for On Pruning State-Space LLMs.

Evaluation

The model has been benchmarked on several tasks:

Task Metric Value Stderr
ARC Challenge acc 0.4164 ±0.0144
ARC Easy acc 0.7492 ±0.0089
Hellaswag acc 0.4988 ±0.0050
Lambada (OpenAI) acc 0.5707 ±0.0069
perplexity 7.0794 ±0.1761
PIQA acc 0.7661 ±0.0099
Winogrande acc 0.6283 ±0.0136

Note:

  • For accuracy metrics, higher values are better.
  • For perplexity, lower values are better.

Intended Use

  • General NLP Tasks: Suitable for various language understanding and reasoning tasks.
  • Research & Prototyping: Ideal for lightweight experiments and efficient production environments.

Citation

If you use this model, please cite:

@misc{ghattas2025pruningstatespacellms,
  title={On Pruning State-Space LLMs}, 
  author={Tamer Ghattas and Michael Hassid and Roy Schwartz},
  year={2025},
  eprint={2502.18886},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.18886}, 
}

Model Card Last Updated: February 16, 2025

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tGhattas/Smol2-Mamba-1.9B

Finetuned
(21)
this model