Llama3-8B-ITCL-Bitnet1.6B πŸš€

Description πŸ“œ

Llama3-8B-ITCL-Bitnet1.6B is an experimental LLM model transformed from Llama3, optimized with bitlinear layers to enhance memory efficiency and inference speed. This model is designed for natural language processing tasks and is particularly useful in environments where resource-efficient performance is required. 🌟


Features 🌈

  • Model Size: 8B parameters 🧠
  • Architecture: BitNet πŸ—οΈ
  • Bitlinear Layers: Reduces weights to values of 1, 0, and -1. βž–
  • Optimized for: Fast inference and memory efficiency ⚑


Model size: 1.604B parameters
2024-10-08 14:53:07 - INFO - πŸ”’ Number of parameters in the model after extracting weights: 1
2024-10-08 14:53:07 - INFO - πŸ“ Reduced model structure:
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-5): 6 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (k_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (v_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (o_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        (mlp): LlamaMLP(
          (gate_proj): BitLinear(in_features=4096, out_features=2048, bias=False)
          (up_proj): BitLinear(in_features=4096, out_features=2048, bias=False)
          (down_proj): BitLinear(in_features=2048, out_features=4096, bias=False)
          (act_fn): SiLU()
        (input_layernorm): Identity()
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  (lm_head): Linear(in_features=4096, out_features=128256, bias=False)

Requirements πŸ“¦

Make sure you have the following libraries installed:

pip install transformers torch huggingface_hub wandb coloredlogs

You can install these dependencies using pip! πŸŽ‰

Usage πŸ”

Loading the Model

To load the model, you can simply run the following code:

Para usar este modelo, puedes cargarlo desde Hugging Face con el siguiente cΓ³digo:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.models.llama.modeling_llama import *
import torch
from torch import nn
import torch.nn.functional as F
import coloredlogs
import logging

coloredlogs.install(level='INFO', fmt='%(asctime)s - %(levelname)s - %(message)s', logger=logging.getLogger())
logger = logging.getLogger(__name__)

HF_TOKEN = "you_api_key_here"

model = "ejbejaranos/Llama3-8B-ITCL-Bitnet1.6B"

# Load a pretrained BitNet model
tokenizer = AutoTokenizer.from_pretrained(model)

model = AutoModelForCausalLM.from_pretrained(

# Establece el pad_token_id
model.config.pad_token_id = tokenizer.eos_token_id

def count_parameters(model):
    # Calculate the number of parameters in billions
    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad) / 10**9
    print(f"Model size: {num_params:.3f}B parameters")
    return int(num_params)

def activation_quant(x):
    scale = 127.0 / x.abs().max(dim=-1, keepdim=True).values.clamp_(min=1e-5)
    y = (x * scale).round().clamp_(-128, 127)
    y = y / scale
    return y

def weight_quant(w):
    scale = 1.0 / w.abs().mean().clamp_(min=1e-5)
    u = (w * scale).round().clamp_(-1, 1)
    u = u / scale
    return u

class BitLinear(nn.Linear):
    def forward(self, x):
        w = self.weight  # a weight tensor with shape [d, k]
        x = x.to(w.device)
        RMSNorm = LlamaRMSNorm(x.shape[-1]).to(w.device)
        x_norm = RMSNorm(x)
        x_quant = x_norm + (activation_quant(x_norm) - x_norm).detach()
        w_quant = w + (weight_quant(w) - w).detach()
        y = F.linear(x_quant, w_quant)
        return y

def convert_to_bitnet(model, copy_weights):
    for name, module in model.named_modules():
        if isinstance(module, LlamaSdpaAttention) or isinstance(module, LlamaMLP):
            for child_name, child_module in module.named_children():
                if isinstance(child_module, nn.Linear):
                    bitlinear = BitLinear(child_module.in_features, child_module.out_features, child_module.bias is not None).to(device="cuda:0")
                    if copy_weights:
                        bitlinear.weight = child_module.weight
                        if child_module.bias is not None:
                            bitlinear.bias = child_module.bias
                    setattr(module, child_name, bitlinear)
        elif isinstance(module, LlamaDecoderLayer):
            for child_name, child_module in module.named_children():
                if isinstance(child_module, LlamaRMSNorm) and child_name == "input_layernorm":
                    setattr(module, child_name, nn.Identity().to(device="cuda:0"))

convert_to_bitnet(model, copy_weights=True)

logger.info(f"πŸ”’ Number of parameters in the model after extracting weights: {count_parameters(model)}")
logger.info(f"πŸ“ Reduced model structure:\n{model}")

prompt = "What is the color of sky?"
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
inputs['attention_mask'] = inputs['input_ids'] != model.config.pad_token_id

generate_ids = model.generate(inputs.input_ids, attention_mask=inputs['attention_mask'], max_length=250)
decoded_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)

print(decoded_output[0])  # Print the generated response

Performing Inference

Generate text using the model to unleash its power! πŸ’¬βœ¨

- "What is the color of sky?"

Training πŸ‹οΈ

To train the model, configure your settings and implement your training logic. πŸ› οΈ

Contributions 🀝

If you would like to contribute to this project, please follow these steps:

  1. Fork the repository. 🍴
  2. Create your branch (git checkout -b feature-new-feature). 🌿
  3. Make your changes and commit. πŸ“…
  4. Push to the branch. πŸ“€
  5. Open a Pull Request. πŸ“¬

License πŸ“„

This project is licensed under the MIT License. See the LICENSE file for details.

Contact πŸ“«

For questions or suggestions, feel free to reach out to me:

