Model Card for SwarmFormer-Small

SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.

Model Details

Model Description

Compact version of SwarmFormer with:

Token embedding layer with dropout (0.3)
Two SwarmFormer layers
Mean pooling and classification
Optimized for shorter sequences
Developed by: Jordan Legg, Mikus Sturmanis, Takara.ai
Funded by: Takara.ai
Shared by: Takara.ai
Model type: Hierarchical transformer
Language(s): English
License: Not specified
Finetuned from model: Trained from scratch

Model Sources

Repository: https://github.com/takara-ai/SwarmFormer
Paper: Takara.ai Research
Demo: Not available

Uses

Direct Use

Text classification
Sentiment analysis
Resource-constrained environments

Out-of-Scope Use

Text generation
Machine translation
Tasks requiring >256 tokens
Tasks requiring high precision

Training Details

Training Data

Dataset: IMDB Movie Review
Size: 50,000 samples
Augmentation techniques applied

Training Procedure

Model Architecture Details

Token Embedding Layer:

- Embedding layer (vocab_size → 128)
- Dropout rate: 0.3

Local Swarm Aggregator:

- Input dropout: 0.3
- Local MLP:
  - Linear(128 → 128)
  - GELU
  - Dropout(0.3)
  - Linear(128 → 128)
- Gate network with GELU

Clustering Mechanism:
- Cluster size: 8 tokens
- Mean pooling per cluster

Global Cluster Attention:

- Q/K/V projections: Linear(128 → 128)
- Attention dropout: 0.3

Training Hyperparameters

Embedding dimension: 128
Number of layers: 2
Local update steps: 3
Cluster size: 8
Sequence length: 256
Batch size: 96
Learning rate: 4.76 × 10⁻⁴
Weight decay: 0.0541
Dropout: 0.30

Evaluation

Results

Accuracy: 86.20%
Precision: 83.46%
Recall: 90.31%
F1: 86.75%
Inference time: 0.36s (25k samples)
Mean batch latency: 3.67ms
Throughput: 45k samples/s
Peak memory: 8GB

Technical Specifications

Compute Infrastructure

GPU: NVIDIA RTX 2080 Ti
VRAM: 8GB minimum
Training time: 3.6 minutes

How to Get Started

from swarmformer import SwarmFormerModel

model = SwarmFormerModel(
    vocab_size=30000,
    d_model=128,
    seq_len=256,
    cluster_size=8,
    num_layers=2,
    T_local=3
)

Citation

@article{legg2025swarmformer,
  title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
  author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
  journal={Takara.ai Research},
  year={2025},
  url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}

Model Card Authors

Jordan Legg, Mikus Sturmanis, Takara.ai Research Team

Model Card Contact

[email protected]

takara-ai
/

SwarmFormer-Sentiment-Small