--- library_name: transformers license: apache-2.0 datasets: - abideen/Cosmopedia-100k-pretrain language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct --- # 🚀 BitNet-Llama3 (from 8B to 2B) Transformation & Training This project transforms a Llama3 model from 8B parameters to a BitNet architecture with 2B parameters, applying BitLinear layers. Additionally, the model is trained with a predefined dataset and uploaded to Hugging Face for future use. --- ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** ejbejaranos@gmail.com - **Funded by [optional]:** ITCL - **Shared by [optional]:** [More Information Needed] - **Model type:** LLama3 8B Tramsformed to Bitnet - **Language(s) (NLP):** Bitnet - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** ejbejaranos/Bitnet-Llama3-from8BM-now2B ## 📄 Description This repository includes scripts to: 1. 🎯 Transform a Llama3 model to a BitNet architecture. 2. 💻 Train the model using Hugging Face and Weights & Biases. 3. 🚀 Upload the transformed and trained model to Hugging Face for inference and future use. --- ## ⚙️ Requirements - Python 3.8+ - Pytorch 1.10+ - Transformers 4.0+ - Hugging Face Hub API - Weights & Biases --- ## 🧰 Installation Make sure you have all required dependencies installed: ```bash pip install torch transformers datasets wandb huggingface_hub ``` ## 💥 How to Use 1. Using the trained model for inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer from utils.bitnet_transformation import replace_linears_in_hf # Load the BitNet model model = "ejbejaranos/Bitnet-Llama3-from8BM-now2B" model = AutoModelForCausalLM.from_pretrained( model, use_auth_token="YOUR_HF_TOKEN" ) # Replace BitNet layers for inference replace_linears_in_hf(model) tokenizer = AutoTokenizer.from_pretrained("ejbejaranos/Bitnet-Llama3-from8BM-now2B") # Set up for inference model.to(device="cuda:0") prompt = "What is Machine Learning?" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) generate_ids = model.generate(inputs.input_ids, max_length=50) output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] print(output) ``` --- ## 🧑‍🔬 Metrics ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/nCE1-KLDWDqSCmPtDMmWa.png) During training, the following metrics will be logged to Weights & Biases: - `final_loss`: 1.4. - `final_perplexity`: 4.2. --- ## 🎯 Future Goals - Implement additional quantization layers for inference. - Test the model on different datasets and contexts. --- ## 📢 Contact If you have questions, suggestions, or improvements, feel free to open an Issue or contact us through [Hugging Face](https://huggingface.co/ejbejaranos). --- ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] - ## 💡 Acknowledgments Thanks to [Hugging Face](https://huggingface.co/) and [Weights & Biases](https://wandb.ai/) for providing support and tools.