Commit History
Fix some bugs
d19a8a5
Add log info
3733ce3
Update runner
51b14d7
Add dataset creation script
c92ce97
change run.sh
70704f2
pushing tokenizer
c36ebf7
Add runner, fix some bugs
31bf2aa
Merge remote-tracking branch 'origin/saied' into develop
8918872
Remove junks
a749413
adding remove add and remove tag functions
a32918a
Remove extra file
4350a5a
Add normalization steps, fix som bugs, add tfboard tracker
1809a17
Refine saied code
09f9c26
some modification in preprocessing/urls removing
ad582b6
some modification in preprocessing
79fa2a7
editted data_utils-url,html,streched alphabet
95cd35a
Fix rm files
bce7e0a
Add training script with checkpoint and preprocessing + merge scripts
7cfca48
Merge remote-tracking branch 'origin/hooman' into develop
8812e32
adding dataset prepration module
73d5951
pushing a template clm training script for gpt2
01ae861
Hooman Sedghamiz
commited on