🚀 Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:
- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well - 📉 Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection - 🧠 Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k). - 💡 What’s next? I’m preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.