|
--- |
|
base_model: |
|
- concedo/KobbleTinyV2-1.1B |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
# This is the GGUF variant! |
|
|
|
The Original Model is [here](https://huggingface.co/Aculi/Tinyllama-2B) |
|
|
|
Try this Model in Q8 on my homepage [here](https://home.acu.li/) |
|
|
|
# Tinyllama-2B |
|
|
|
This is a merge and a finetune to create a small, but very useable Model, and i have to say, its very good. |
|
|
|
## Basic Question: |
|
|
|
<img src="https://huggingface.co/Aculi/Tinyllama-2B/resolve/main/.huggingface/Screenshot%202024-07-29%20073647.jpg" alt="download.png" width="800" /> |
|
|
|
|
|
## Prompt Template |
|
|
|
Tinyllama-2B uses Alpaca: |
|
|
|
``` |
|
### Instruction: |
|
{prompt} |
|
|
|
### Response: |
|
``` |
|
|
|
### Merge Info: |
|
|
|
This is a frankenmerge of: [concedo/KobbleTinyV2-1.1B](https://huggingface.co/concedo/KobbleTinyV2-1.1B) |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
dtype: bfloat16 |
|
merge_method: passthrough |
|
slices: |
|
- sources: |
|
- layer_range: [0, 16] |
|
model: concedo/KobbleTinyV2-1.1B |
|
- sources: |
|
- layer_range: [5, 16] |
|
model: concedo/KobbleTinyV2-1.1B |
|
parameters: |
|
scale: |
|
- filter: o_proj |
|
value: 0.0 |
|
- filter: down_proj |
|
value: 0.0 |
|
- value: 1.0 |
|
- sources: |
|
- layer_range: [5, 16] |
|
model: concedo/KobbleTinyV2-1.1B |
|
parameters: |
|
scale: |
|
- filter: o_proj |
|
value: 0.0 |
|
- filter: down_proj |
|
value: 0.0 |
|
- value: 1.0 |
|
- sources: |
|
- layer_range: [16, 22] |
|
model: concedo/KobbleTinyV2-1.1B |
|
``` |
|
|
|
## Finetune Info: |
|
|
|
The following YAML configuration was used to finetune this model: |
|
|
|
```yaml |
|
base_model: Fischerboot/2b-tiny-llama-alpaca-instr |
|
model_type: LlamaForCausalLM |
|
tokenizer_type: LlamaTokenizer |
|
|
|
load_in_8bit: false |
|
load_in_4bit: true |
|
strict: false |
|
|
|
datasets: |
|
- path: Fischerboot/freedom-rp-alpaca-shortend |
|
type: alpaca |
|
- path: diffnamehard/toxic-dpo-v0.1-NoWarning-alpaca |
|
type: alpaca |
|
- path: Fischerboot/alpaca-undensored-fixed-50k |
|
type: alpaca |
|
- path: Fischerboot/DAN-alpaca |
|
type: alpaca |
|
- path: Fischerboot/rp-alpaca-next-oone |
|
type: alpaca |
|
|
|
dataset_prepared_path: |
|
val_set_size: 0.05 |
|
output_dir: ./outputs/24r |
|
|
|
adapter: qlora |
|
lora_model_dir: |
|
|
|
sequence_len: 2048 |
|
sample_packing: true |
|
eval_sample_packing: false |
|
pad_to_sequence_len: true |
|
|
|
lora_r: 32 |
|
lora_alpha: 16 |
|
lora_dropout: 0.05 |
|
lora_target_modules: |
|
lora_target_linear: true |
|
lora_fan_in_fan_out: |
|
|
|
wandb_project: |
|
wandb_entity: |
|
wandb_watch: |
|
wandb_name: |
|
wandb_log_model: |
|
|
|
gradient_accumulation_steps: 4 |
|
micro_batch_size: 2 |
|
num_epochs: 4 |
|
optimizer: paged_adamw_32bit |
|
lr_scheduler: cosine |
|
learning_rate: 0.0002 |
|
|
|
train_on_inputs: false |
|
group_by_length: false |
|
bf16: auto |
|
fp16: |
|
tf32: false |
|
|
|
gradient_checkpointing: true |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: true |
|
flash_attention: true |
|
|
|
warmup_steps: 10 |
|
evals_per_epoch: 2 |
|
saves_per_epoch: 1 |
|
debug: |
|
deepspeed: |
|
weight_decay: 0.0 |
|
fsdp: |
|
fsdp_config: |
|
special_tokens: |
|
``` |
|
|
|
### Training results: |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:------:|:----:|:---------------:| |
|
| 1.7881 | 0.0017 | 1 | 2.5329 | |
|
| 1.6899 | 0.4996 | 287 | 1.9272 | |
|
| 1.5511 | 0.9991 | 574 | 1.8750 | |
|
| 1.4797 | 1.4861 | 861 | 1.8476 | |
|
| 1.5279 | 1.9856 | 1148 | 1.8270 | |
|
| 1.4583 | 2.4726 | 1435 | 1.8275 | |
|
| 1.5044 | 2.9721 | 1722 | 1.8215 | |
|
| 1.3051 | 3.4582 | 2009 | 1.8243 | |
|
| 1.5619 | 3.9578 | 2296 | 1.8245 | |