metadata
title: README
emoji: π
colorFrom: purple
colorTo: gray
sdk: static
pinned: false
Multilingual language models are typically large, requiring significant computational resources.
Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds?
Potential Techniques:
- Pruning
- SparseGPT
- ShortGPT
- Knowledge Distillation
- Quantization