---
library_name: transformers
license: apache-2.0
datasets:
- abideen/Cosmopedia-100k-pretrain
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
# 🚀 BitNet-Llama3 (from 8B to 2B) Transformation & Training

This project transforms a Llama3 model from 8B parameters to a BitNet architecture with 2B parameters, applying BitLinear layers. Additionally, the model is trained with a predefined dataset and uploaded to Hugging Face for future use.

---
### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** ejbejaranos@gmail.com
- **Funded by [optional]:** ITCL
- **Shared by [optional]:** [More Information Needed]
- **Model type:** LLama3 8B Tramsformed to Bitnet
- **Language(s) (NLP):** Bitnet
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** ejbejaranos/Bitnet-Llama3-from8BM-now2B

## 📄 Description

This repository includes scripts to:
1. 🎯 Transform a Llama3 model to a BitNet architecture.
2. 💻 Train the model using Hugging Face and Weights & Biases.
3. 🚀 Upload the transformed and trained model to Hugging Face for inference and future use.

---

## ⚙️ Requirements

- Python 3.8+
- Pytorch 1.10+
- Transformers 4.0+
- Hugging Face Hub API
- Weights & Biases

---

## 🧰 Installation

Make sure you have all required dependencies installed:

```bash
pip install torch transformers datasets wandb huggingface_hub
```

## 💥 How to Use

1. Using the trained model for inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from utils.bitnet_transformation import replace_linears_in_hf

# Load the BitNet model
model = "ejbejaranos/Bitnet-Llama3-from8BM-now2B"
model = AutoModelForCausalLM.from_pretrained(
    model,
    use_auth_token="YOUR_HF_TOKEN"
)

# Replace BitNet layers for inference
replace_linears_in_hf(model)
tokenizer = AutoTokenizer.from_pretrained("ejbejaranos/Bitnet-Llama3-from8BM-now2B")

# Set up for inference
model.to(device="cuda:0")
prompt = "What is Machine Learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
generate_ids = model.generate(inputs.input_ids, max_length=50)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(output)

```


---
## 🧑‍🔬 Metrics

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/nCE1-KLDWDqSCmPtDMmWa.png)

During training, the following metrics will be logged to Weights & Biases:
- `final_loss`: 1.4.
- `final_perplexity`: 4.2.

---

## 🎯 Future Goals

- Implement additional quantization layers for inference.
- Test the model on different datasets and contexts.

---

## 📢 Contact

If you have questions, suggestions, or improvements, feel free to open an Issue or contact us through [Hugging Face](https://huggingface.co/ejbejaranos).

---

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
- 


## 💡 Acknowledgments

Thanks to [Hugging Face](https://huggingface.co/) and [Weights & Biases](https://wandb.ai/) for providing support and tools.