title: README | |
emoji: π | |
colorFrom: purple | |
colorTo: gray | |
sdk: static | |
pinned: false | |
Multilingual language models are typically large, requiring significant computational resources. | |
Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds? | |
Potential Techniques: | |
- Pruning | |
- SparseGPT | |
- ShortGPT | |
- Knowledge Distillation | |
- Quantization | |