|
--- |
|
license: apache-2.0 |
|
tags: |
|
- jamba |
|
datasets: |
|
- teknium/OpenHermes-2.5 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here |
|
|
|
*There's been limited testing so example outputs yet* |
|
|
|
--- |
|
## Training |
|
|
|
|
|
### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]** |
|
|
|
``` |
|
1483 5.986700 |
|
1484 5.764100 |
|
1485 5.887200 |
|
1486 5.445200 |
|
1487 6.086300 |
|
1488 5.718300 |
|
1489 5.670300 |
|
1490 5.440900 |
|
1491 4.945900 |
|
1492 6.154700 |
|
1493 5.624800 |
|
1494 6.868100 |
|
1495 5.627100 |
|
1496 5.192700 |
|
1497 5.826800 |
|
1498 5.512200 |
|
1499 5.869900 |
|
1500 5.852300 |
|
1501 5.574800 |
|
1502 5.299200 |
|
1503 5.631200 |
|
1504 5.535600 |
|
1505 5.626000 |
|
1506 5.093300 |
|
1507 5.278000 |
|
1508 5.585400 |
|
1509 5.318600 |
|
1510 5.319200 |
|
1511 5.513900 |
|
1512 5.375400 |
|
1513 5.460600 |
|
1514 5.045300 |
|
1515 6.013600 |
|
1516 5.812300 |
|
1517 5.707400 |
|
1518 5.109800 |
|
1519 5.212900 |
|
1520 5.317200 |
|
1521 5.935400 |
|
1522 5.733900 |
|
1523 5.866000 |
|
1524 5.675400 |
|
1525 5.580800 |
|
1526 4.996900 |
|
1527 5.666700 |
|
1528 4.979900 |
|
``` |
|
|
|
### Hyperparameters |
|
|
|
```py |
|
from trl import SFTTrainer |
|
import torch |
|
from peft import LoraConfig |
|
from transformers import AutoTokenizer, TrainingArguments |
|
from transformers import BitsAndBytesConfig |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Initialize or load your tokenizer and model here |
|
tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1") |
|
tokenizer.padding_side = 'right' |
|
tokenizer.padding_side = 'left' |
|
|
|
max_seq_length = 4096 |
|
|
|
lora_config = LoraConfig( |
|
r=8, |
|
lora_alpha=16, |
|
target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"], |
|
lora_dropout=0.2, |
|
task_type="CAUSAL_LM", |
|
bias="none" |
|
) |
|
|
|
trainer = SFTTrainer( |
|
model=model, |
|
train_dataset=train_dataset, |
|
dataset_text_field="text", |
|
max_seq_length=max_seq_length, |
|
tokenizer=tokenizer, |
|
args=TrainingArguments( |
|
num_train_epochs=1, |
|
lr_scheduler_type='linear', |
|
learning_rate=2e-5, |
|
per_device_train_batch_size=1, |
|
gradient_accumulation_steps=8, |
|
gradient_checkpointing=True, |
|
warmup_steps=10, |
|
weight_decay=0.2, |
|
fp16=not torch.cuda.is_bf16_supported(), |
|
bf16=torch.cuda.is_bf16_supported(), |
|
logging_steps=1, |
|
save_steps=100, |
|
output_dir="outputs", |
|
optim="paged_adamw_8bit", |
|
seed=42, |
|
), |
|
) |
|
|
|
# Set environment variables for PyTorch memory management |
|
import os |
|
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True" |
|
``` |
|
|