--- license: apache-2.0 datasets: - togethercomputer/RedPajama-Data-V2 - stingning/ultrachat language: - fr - en --- # Mambaoutai 1.6B Mambaoutai is the result of all the experiments and training runs described in the [following blog post](https://www.lighton.ai/fr/blog/blog-4/passing-the-torch-training-a-mamba-model-for-smooth-handover-54), where all details about the model series is shared. Mambaoutai is series of small mamba checkpoints released for the community to explore, trained on French, English and code. We run two different decay phases with the WSD-scheduler, and release model checkpoints pretrained both with and without instruction data. ## Usage You need to install `transformers` from `main` until `transformers=4.39.0` is released. ```bash pip install git+https://github.com/huggingface/transformers@main ``` We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using: ```bash pip install causal-conv1d>=1.2.0 pip install mamba-ssm ``` If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised `cuda` kernels will be used. ### Generation Use this snippet of code to generate text from the model: ```python from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer import torch if model_has_instruct_data: # use chat tokens prompt = ”Tell me something about Paris.” else: # prompt the non-instructed tuned model gently prompt = ”This is a text about Paris. Paris is” tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai") model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai") input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"] out = model.generate(input_ids, max_new_tokens=10) print(tokenizer.batch_decode(out)) ``` ### Training checkpoints You can find some of the training checkpoints in the repo branch. On branch corresponding to the model at some point in time during training. You can do inference with these training checkpoints by adding the `revision` parameter to the `from_pretrained` method. For example, to load the model checkpoint after 30000 steps of pretraining, you can use the following code: ```python from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai", revision="pre-30000") model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai", revision="pre-30000") input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"] out = model.generate(input_ids, max_new_tokens=10) print(tokenizer.batch_decode(out)) ``` ### On-device Inference Since Mambaoutai is only 1.6B parameters, it can run on a CPU at a a fast speed. Here is an example of how to run it on llama.cpp: ```bash # Clone llama.cpp repository and compile it from source git clone https://github.com/ggerganov/llama.cpp\ cd llama.cpp make # Create a venv and install dependencies conda create -n mamba-cpp python=3.10 conda activate mamba-cpp pip install -r requirements/requirements-convert-hf-to-gguf.txt # Download the weights, tokenizer, config, tokenizer_config and special_tokens_map from this repo and # put them in a directory 'Mambaoutai/' mkdir Mambaoutai # Convert the weights to GGUF format python convert-hf-to-gguf.py Mambaoutai # Run inference with a prompt ./main -m Mambaoutai/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 1 ``` ### Model hyperparameters More details about the model hyperparameters are given in the table below : | Parameter | Value | |-----------------------|----------| | d_model | 2688 | | n_layer | 28 | | vocab_size | 65024 | | context_len | 4096 | | rms_norm | true | | residual_in_fp32 | true | | fused_add_norm | true | | conv_kernel | 4 | | d_inner | 5376 | | state_size | 16 | | dtype | bfloat16 | | tie_word_embeddings | false | | non embeddings params | 1.27B |