Caduceus for Transfer Learning

by Sethulakshmi - opened Mar 10, 2024

Mar 10, 2024

I would like to use the Caduceus model as a pre-trained model for detecting if an input DNA sequence in Fasta is affected by a genetic disease or not. I have very less datasets for affected and unaffected files for each related gene to the disease. I am finding difficulty in compiling the model due to its compile function.

yairschiff

Kuleshov Group org Mar 10, 2024

Our model was trained with pytorch but it looks like you are using keras, I don't think our model will be compatible with this code snippet.

Sethulakshmi

Mar 12, 2024

Can you suggest how i could include the mode for transfer learning and train a bit with my own fasta dataset.

yairschiff

Kuleshov Group org Mar 13, 2024

Are you able to use pytorch? If so you can load the model from HF using the steps in README here, e.g.

model_name = "kuleshov-group/caduceus-ps_seqlen-131k_d_model-256_n_layer-16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)

then you can use model to train within your training loop using your dataset.

Let me know if that helps clarify things.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment