Model Card for SwarmFormer-Small

SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.

Model Details

Model Description

Compact version of SwarmFormer with:

  • Token embedding layer with dropout (0.3)

  • Two SwarmFormer layers

  • Mean pooling and classification

  • Optimized for shorter sequences

  • Developed by: Jordan Legg, Mikus Sturmanis, Takara.ai

  • Funded by: Takara.ai

  • Shared by: Takara.ai

  • Model type: Hierarchical transformer

  • Language(s): English

  • License: Not specified

  • Finetuned from model: Trained from scratch

Model Sources

Uses

Direct Use

  • Text classification
  • Sentiment analysis
  • Resource-constrained environments

Out-of-Scope Use

  • Text generation
  • Machine translation
  • Tasks requiring >256 tokens
  • Tasks requiring high precision

Training Details

Training Data

  • Dataset: IMDB Movie Review
  • Size: 50,000 samples
  • Augmentation techniques applied

Training Procedure

Model Architecture Details

  1. Token Embedding Layer:

    - Embedding layer (vocab_size → 128)
    - Dropout rate: 0.3
    
  2. Local Swarm Aggregator:

    - Input dropout: 0.3
    - Local MLP:
      - Linear(128128)
      - GELU
      - Dropout(0.3)
      - Linear(128128)
    - Gate network with GELU
    
  3. Clustering Mechanism:

    • Cluster size: 8 tokens
    • Mean pooling per cluster
  4. Global Cluster Attention:

    - Q/K/V projections: Linear(128128)
    - Attention dropout: 0.3
    

Training Hyperparameters

  • Embedding dimension: 128
  • Number of layers: 2
  • Local update steps: 3
  • Cluster size: 8
  • Sequence length: 256
  • Batch size: 96
  • Learning rate: 4.76 × 10⁻⁴
  • Weight decay: 0.0541
  • Dropout: 0.30

Evaluation

Results

  • Accuracy: 86.20%
  • Precision: 83.46%
  • Recall: 90.31%
  • F1: 86.75%
  • Inference time: 0.36s (25k samples)
  • Mean batch latency: 3.67ms
  • Throughput: 45k samples/s
  • Peak memory: 8GB

Technical Specifications

Compute Infrastructure

  • GPU: NVIDIA RTX 2080 Ti
  • VRAM: 8GB minimum
  • Training time: 3.6 minutes

How to Get Started

from swarmformer import SwarmFormerModel

model = SwarmFormerModel(
    vocab_size=30000,
    d_model=128,
    seq_len=256,
    cluster_size=8,
    num_layers=2,
    T_local=3
)

Citation

@article{legg2025swarmformer,
  title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
  author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
  journal={Takara.ai Research},
  year={2025},
  url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}

Model Card Authors

Jordan Legg, Mikus Sturmanis, Takara.ai Research Team

Model Card Contact

[email protected]

Downloads last month
2
Safetensors
Model size
4.3M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train takara-ai/SwarmFormer-Sentiment-Small

Collection including takara-ai/SwarmFormer-Sentiment-Small