File size: 2,262 Bytes
fec90af e968c69 fec90af 1fae59e 913d1e8 1fae59e 913d1e8 1fae59e ce59ec4 1fae59e 913d1e8 d63f2a7 913d1e8 1fae59e 913d1e8 d63f2a7 bd5fef4 ce59ec4 d63f2a7 1fae59e ce59ec4 c64dfc8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
title: README
emoji: π
colorFrom: purple
colorTo: gray
sdk: static
pinned: false
---
Multilingual language models are typically large, requiring significant computational resources.

Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds running in production with huge batch sizes?
.png)
# Techniques:
- Pruning
- Unstructured Pruning
- Structured Pruning
- Semi-Structured Pruning
- Methods Used
- SparseGPT | [GitHub](https://github.com/VishnuVardhanSaiLanka/sparsegpt/tree/aya)
- ShortGPT | [KLDBasedPruning & Perplexity Sensivities](https://github.com/rsk2327/DistAya/tree/main)
- Knowledge Distillation
- Hidden State-Based Distillation ~ [DistillKit](https://arcee-ai-distillkit.my.canva.site/) | [GitHub](https://github.com/ShayekhBinIslam/DistillKit)
- Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
- Minitron: Compact Language models via Pruning & Knowledge Distillation
- DistiLLM: Towards Streamlined Distillation for Large Language Models
- Quantization
- Quantization Aware Training (QAT)
- Post Training Quantization (PTQ)
- KV Cache Quantization
- Weight & Activation Quantization
- Low-Rank Factorization
- Fine-Tuning | [GitHub](https://github.com/rsk2327/DistAya/tree/track/fine-tuning)

# Datasets:
Initial 7 datasets unified, having 6.62M rows which includes the following:
- Bangla_Alpaca_Orca : Bangle
- Urdu_Instruct_News_Article_Generation: Urdu
- Urdu_Instruct_News_Headline_Generation: Urdu
- Urdu_Instruct_News_Category_Classification: Urdu
- cidar: Arabic
- Six_Millions_Instruction_Dataset_For_Arabic_Llm_Ft: Arabic
- instructv3: English
## Get in touch with the team:
- Mayank Bhaskar -> [email protected]
- Ahmad Anis -> [email protected]
- Drishti Sharma -> [email protected]
- Vishnu Vardhan -> [email protected]
- Yaya -> [email protected]
- Shayekh Bin Islam -> [email protected] |