--- title: README emoji: 🚀 colorFrom: purple colorTo: gray sdk: static pinned: false --- Multilingual language models have many deployment challenges. ![Deployment Challenges](DeploymentChallenges.png) Can we create multilingual models that maintain performance comparable to their larger models while reducing size, latency and inference speeds running in production with huge batch sizes? ![MemoryVariations through time](MemoryVariations(Latency).png) # Techniques: - Pruning - Unstructured Pruning - Structured Pruning - Semi-Structured Pruning - Methods Used - SparseGPT | [GitHub](https://github.com/VishnuVardhanSaiLanka/sparsegpt/tree/aya) - ShortGPT | [KLDBasedPruning & Perplexity Sensivities](https://github.com/rsk2327/DistAya/tree/main) - Knowledge Distillation - Hidden State-Based Distillation ~ [DistillKit](https://arcee-ai-distillkit.my.canva.site/) | [GitHub](https://github.com/ShayekhBinIslam/DistillKit) - Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling - On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes - Minitron: Compact Language models via Pruning & Knowledge Distillation - DistiLLM: Towards Streamlined Distillation for Large Language Models - Quantization - Quantization Aware Training (QAT) - Post Training Quantization (PTQ) - KV Cache Quantization - Weight & Activation Quantization - Low-Rank Factorization - Fine-Tuning | [GitHub](https://github.com/rsk2327/DistAya/tree/track/fine-tuning) ![Techniques](Techniques.png) # Datasets: Initial 7 datasets unified, having 6.62M rows which includes the following: - Bangla_Alpaca_Orca : Bangle - Urdu_Instruct_News_Article_Generation: Urdu - Urdu_Instruct_News_Headline_Generation: Urdu - Urdu_Instruct_News_Category_Classification: Urdu - cidar: Arabic - Six_Millions_Instruction_Dataset_For_Arabic_Llm_Ft: Arabic - instructv3: English ## Get in touch with the team: - Mayank Bhaskar -> mayankbhaskar007@gmail.com - Ahmad Anis -> ahmadanis5050@gmail.com - Drishti Sharma -> drishtisharma96505@gmail.com - Vishnu Vardhan -> vardhanvishnu691@gmail.com - Yaya -> yayasysco@gmail.com - Shayekh Bin Islam -> shayekh.bin.islam@gmail.com