DeepSpeed compatibility

#88
by pchiang5 - opened

Thank you for the great tool. I found it is possible to use DeepSpeed to perform pertaining in the example. Is it possible to extend this tool with fine-tuning and inference as well? Although my installation of zero 3 offload worked with other larger models, directly including straining_args_init = TrainingArguments(**training_args, deepspeed ='ds_config_zero3.json') in cell classification example did not work. If DeepSpeed could be used with the other steps, those with limited computing resources could analyze with the 12L model.

Thank you for your interest in Geneformer. We are glad to hear that DeepSpeed worked for the pretraining example. We definitely agree that using DeepSpeed with the fine-tuning and inference steps would be great for enabling use of the deeper model with less resources. We took care to integrate with Hugging Face to enable use of their extensive and user-friendly tools. We have modified their data loaders to function with our biological data inputs, but otherwise the trainer should function the same as for NLP applications. We pretrained both the 6L and 12L models over 2 years ago so there have been updates to Hugging Face's integration of DeepSpeed since then. You may consider running the fine-tuning as a script with DeepSpeed from the command line as we have shown for the pretraining example (and since that was working for you). Otherwise, we would suggest searching for the error you encountered in the Hugging Face transformers repository open/closed issues and/or opening a new issue. If you do so, we encourage you to update the discussion here with a reference to the relevant issue on Hugging Face transformers as that would be helpful to others in the community who encounter the same question.

ctheodoris changed discussion status to closed

Sign up or log in to comment