Commits · Dovakiins/qwerrwe

Peft lotfq (#1222)

4cb7900
unverified

winglian commited on Jan 28, 2024

ADD: warning if hub_model_id ist set but not any save strategy (#1202)

af29d81
unverified

JohanWork

winglian commited on Jan 26, 2024

more checks and fixes for deepspeed and fsdp (#1208) [skip ci]

e923e62
unverified

winglian commited on Jan 26, 2024

precompute dpo logprobs setting and fixes (#1199) [skip ci]

33e1170
unverified

winglian commited on Jan 25, 2024

DPO fixes v2 (#1174)

59a31fe
unverified

winglian commited on Jan 23, 2024

Phi2 multipack (#1173)

814aee6
unverified

winglian commited on Jan 23, 2024

support for explicit test_dataset definition for evals (#786)

cda52dc
unverified

winglian commited on Jan 23, 2024

improve vram use w gradient checkpointing (#1167) [skip ci]

802f966
unverified

winglian commited on Jan 23, 2024

set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122) [skip ci]

782b6a4
unverified

winglian

Nanobit commited on Jan 22, 2024

Qwen2 (#1166)

f5a828a
unverified

winglian commited on Jan 22, 2024

Deprecate max packed sequence len (#1141)

2ce5c0d
unverified

winglian commited on Jan 20, 2024

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

winglian commited on Jan 18, 2024

fix bf16 check when preprocessing data (#1140)

317fa25
unverified

winglian commited on Jan 18, 2024

Add `layers_to_transform` for `lora_config` (#1118)

8487b97
unverified

xzuyn commited on Jan 16, 2024

Enable or disable bf16 support based on availability (#1116)

0865613
unverified

Simon Hällqvist commited on Jan 14, 2024

update sharegpt conversations when chatml chat template is set (#1075) [skip ci]

0ce1a65
unverified

winglian commited on Jan 10, 2024

be more robust about checking embedding modules for lora finetunes (#1074) [skip ci]

0f10080
unverified

winglian commited on Jan 10, 2024

feature: better device mapping for large models (#918)

bdfefaf
unverified

kallewoof Karl-Johan Alm

winglian commited on Jan 5, 2024

chore(config): clean up old log for Qwen (#1034)

74532dd
unverified

Nanobit commited on Jan 3, 2024

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)

1ffa386
unverified

Nanobit commited on Dec 22, 2023

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

kallewoof commited on Dec 13, 2023

new evals_per_epoch and saves_per_epoch to make things cleaner (#944)

5f79b82
unverified

winglian commited on Dec 12, 2023

Support device_map=sequential & max_memory config parameters (#903)

992e742
unverified

Bryan Thornbury

winglian commited on Dec 4, 2023

Feat(wandb): Refactor to be more flexible (#767)

a1da39c
unverified

Nanobit commited on Dec 4, 2023

Feat: Add Qwen (#894)

1115c50
unverified

Nanobit commited on Nov 25, 2023

fix: warning should not show if eval_batch_size not provided (#896)

7ee3c4c
unverified

Nanobit commited on Nov 25, 2023

Feat: Add warmup_ratio (#893)

fb12895
unverified

Nanobit commited on Nov 25, 2023

allow overriding of model_config parameters from the YML (#853)

1bc1186
unverified

winglian commited on Nov 16, 2023

simplify by removing duplicate base_model_config (#772)

2d8def6
unverified

winglian commited on Oct 23, 2023

Fix: Warn when fullfinetune without adapter (#770)

44c9d01
unverified

Nanobit commited on Oct 22, 2023

convert exponential notation lr to floats (#771)

ca84cca
unverified

winglian commited on Oct 22, 2023

Fix: eval table conflict with eval_sample_packing (#769)

9923b72
unverified

Nanobit commited on Oct 22, 2023

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662)

2642cae
unverified

winglian commited on Oct 3, 2023

Make dataset_processes configurable (#651)

9ec2077
unverified

corbt commited on Sep 29, 2023

Fix bug when using pretokenized datasets (#652)

590d603
unverified

ich commited on Sep 29, 2023

Feat: Add example for Mistral (#644)

eb41f76
unverified

Nanobit commited on Sep 28, 2023

Fix(cfg): Add validation for save_strategy and eval_strategy (#633)

383f88d
unverified

Nanobit commited on Sep 28, 2023

use fastchat conversations template (#578)

e7d3e2d
unverified

winglian commited on Sep 27, 2023

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Nanobit commited on Sep 26, 2023

Fix: Fail bf16 check when running on cpu during merge (#631)

cfbce02
unverified

Nanobit commited on Sep 25, 2023

add bf16 check (#587)

131afdb
unverified

winglian commited on Sep 17, 2023

make phi training work with Loras (#588)

62eaee7
unverified

winglian commited on Sep 16, 2023

E2e device cuda (#575)

2414673
unverified

winglian commited on Sep 15, 2023

Model parallel (#538)

f6060a6
unverified

winglian commited on Sep 13, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

Fix pretraining with iterable/streaming Dataset (#556)

2f586d1
unverified

Jan Philipp Harries Jan Philipp Harries commited on Sep 13, 2023

Early stopping metric (#537)

e30f1e3
unverified

winglian commited on Sep 8, 2023

recommend padding when using sample packing (#531)

3437149
unverified

winglian commited on Sep 6, 2023

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

winglian commited on Sep 5, 2023

Commit History

Peft lotfq (#1222) 4cb7900 unverified

ADD: warning if hub_model_id ist set but not any save strategy (#1202) af29d81 unverified

more checks and fixes for deepspeed and fsdp (#1208) [skip ci] e923e62 unverified

precompute dpo logprobs setting and fixes (#1199) [skip ci] 33e1170 unverified

DPO fixes v2 (#1174) 59a31fe unverified

Phi2 multipack (#1173) 814aee6 unverified

support for explicit test_dataset definition for evals (#786) cda52dc unverified

improve vram use w gradient checkpointing (#1167) [skip ci] 802f966 unverified

set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122) [skip ci] 782b6a4 unverified

Qwen2 (#1166) f5a828a unverified

Deprecate max packed sequence len (#1141) 2ce5c0d unverified

Multipack simplify for Mixtral (#1142) 6910e6a unverified

fix bf16 check when preprocessing data (#1140) 317fa25 unverified

Add `layers_to_transform` for `lora_config` (#1118) 8487b97 unverified

Enable or disable bf16 support based on availability (#1116) 0865613 unverified

update sharegpt conversations when chatml chat template is set (#1075) [skip ci] 0ce1a65 unverified

be more robust about checking embedding modules for lora finetunes (#1074) [skip ci] 0f10080 unverified

feature: better device mapping for large models (#918) bdfefaf unverified

chore(config): clean up old log for Qwen (#1034) 74532dd unverified

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787) 1ffa386 unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941) ef24342 unverified

new evals_per_epoch and saves_per_epoch to make things cleaner (#944) 5f79b82 unverified

Support device_map=sequential & max_memory config parameters (#903) 992e742 unverified

Feat(wandb): Refactor to be more flexible (#767) a1da39c unverified

Feat: Add Qwen (#894) 1115c50 unverified

fix: warning should not show if eval_batch_size not provided (#896) 7ee3c4c unverified

Feat: Add warmup_ratio (#893) fb12895 unverified

allow overriding of model_config parameters from the YML (#853) 1bc1186 unverified

simplify by removing duplicate base_model_config (#772) 2d8def6 unverified

Fix: Warn when fullfinetune without adapter (#770) 44c9d01 unverified

convert exponential notation lr to floats (#771) ca84cca unverified

Fix: eval table conflict with eval_sample_packing (#769) 9923b72 unverified

Implement fused modules (#747) 15d3a65 unverified

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662) 2642cae unverified

Make dataset_processes configurable (#651) 9ec2077 unverified

Fix bug when using pretokenized datasets (#652) 590d603 unverified

Feat: Add example for Mistral (#644) eb41f76 unverified

Fix(cfg): Add validation for save_strategy and eval_strategy (#633) 383f88d unverified

use fastchat conversations template (#578) e7d3e2d unverified

Feat: Add support for upstream FA2 (#626) 19a600a unverified

Fix: Fail bf16 check when running on cpu during merge (#631) cfbce02 unverified

add bf16 check (#587) 131afdb unverified

make phi training work with Loras (#588) 62eaee7 unverified

E2e device cuda (#575) 2414673 unverified

Model parallel (#538) f6060a6 unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

Fix pretraining with iterable/streaming Dataset (#556) 2f586d1 unverified

Early stopping metric (#537) e30f1e3 unverified

recommend padding when using sample packing (#531) 3437149 unverified

Add support for GPTQ using native transformers/peft (#468) 3355706 unverified

Peft lotfq (#1222)

4cb7900
unverified

ADD: warning if hub_model_id ist set but not any save strategy (#1202)

af29d81
unverified

more checks and fixes for deepspeed and fsdp (#1208) [skip ci]

e923e62
unverified

precompute dpo logprobs setting and fixes (#1199) [skip ci]

33e1170
unverified

DPO fixes v2 (#1174)

59a31fe
unverified

Phi2 multipack (#1173)

814aee6
unverified

support for explicit test_dataset definition for evals (#786)

cda52dc
unverified

improve vram use w gradient checkpointing (#1167) [skip ci]

802f966
unverified

set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122) [skip ci]

782b6a4
unverified

Qwen2 (#1166)

f5a828a
unverified

Deprecate max packed sequence len (#1141)

2ce5c0d
unverified

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

fix bf16 check when preprocessing data (#1140)

317fa25
unverified

Add `layers_to_transform` for `lora_config` (#1118)

8487b97
unverified

Enable or disable bf16 support based on availability (#1116)

0865613
unverified

update sharegpt conversations when chatml chat template is set (#1075) [skip ci]

0ce1a65
unverified

be more robust about checking embedding modules for lora finetunes (#1074) [skip ci]

0f10080
unverified

feature: better device mapping for large models (#918)

bdfefaf
unverified

chore(config): clean up old log for Qwen (#1034)

74532dd
unverified

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)

1ffa386
unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

new evals_per_epoch and saves_per_epoch to make things cleaner (#944)

5f79b82
unverified

Support device_map=sequential & max_memory config parameters (#903)

992e742
unverified

Feat(wandb): Refactor to be more flexible (#767)

a1da39c
unverified

Feat: Add Qwen (#894)

1115c50
unverified

fix: warning should not show if eval_batch_size not provided (#896)

7ee3c4c
unverified

Feat: Add warmup_ratio (#893)

fb12895
unverified

allow overriding of model_config parameters from the YML (#853)

1bc1186
unverified

simplify by removing duplicate base_model_config (#772)

2d8def6
unverified

Fix: Warn when fullfinetune without adapter (#770)

44c9d01
unverified

convert exponential notation lr to floats (#771)

ca84cca
unverified

Fix: eval table conflict with eval_sample_packing (#769)

9923b72
unverified

Implement fused modules (#747)

15d3a65
unverified

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662)

2642cae
unverified

Make dataset_processes configurable (#651)

9ec2077
unverified

Fix bug when using pretokenized datasets (#652)

590d603
unverified

Feat: Add example for Mistral (#644)

eb41f76
unverified

Fix(cfg): Add validation for save_strategy and eval_strategy (#633)

383f88d
unverified

use fastchat conversations template (#578)

e7d3e2d
unverified

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Fix: Fail bf16 check when running on cpu during merge (#631)

cfbce02
unverified

add bf16 check (#587)

131afdb
unverified

make phi training work with Loras (#588)

62eaee7
unverified

E2e device cuda (#575)

2414673
unverified

Model parallel (#538)

f6060a6
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Fix pretraining with iterable/streaming Dataset (#556)

2f586d1
unverified

Early stopping metric (#537)

e30f1e3
unverified

recommend padding when using sample packing (#531)

3437149
unverified

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified