|
Usage |
|
===== |
|
|
|
Quick Start |
|
~~~~~~~~~~~ |
|
|
|
Predict a new network using a trained model |
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
Pre-trained models can be downloaded from [TBD]. |
|
Candidate pairs should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2]. |
|
Optionally, a third column with [label] can be provided, so predictions can be made using training or test data files (but the label will not affect the predictions). |
|
|
|
.. code-block:: bash |
|
|
|
dscript predict --pairs [input data] --seqs [sequences, .fasta format] --model [model file] |
|
|
|
Embed sequences with language model |
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
Sequences should be in ``.fasta`` format. |
|
|
|
.. code-block:: bash |
|
|
|
dscript embed --seqs [sequences] --outfile [embedding file] |
|
|
|
Train and save a model |
|
^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
Training and validation data should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2], [label]. |
|
|
|
.. code-block:: bash |
|
|
|
dscript train --train [training data] --val [validation data] --embedding [embedding file] --save-prefix [prefix] |
|
|
|
|
|
Evaluate a trained model |
|
^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
.. code-block:: bash |
|
|
|
dscript eval --model [model file] --test [test data] --embedding [embedding file] --outfile [result file] |
|
|
|
|
|
Prediction |
|
~~~~~~~~~~ |
|
|
|
.. code-block:: bash |
|
|
|
usage: dscript predict [-h] --pairs PAIRS --model MODEL [--seqs SEQS] |
|
[--embeddings EMBEDDINGS] [-o OUTFILE] [-d DEVICE] |
|
[--thresh THRESH] |
|
|
|
Make new predictions with a pre-trained model. One of --seqs and --embeddings is required. |
|
|
|
optional arguments: |
|
-h, --help show this help message and exit |
|
--pairs PAIRS Candidate protein pairs to predict |
|
--model MODEL Pretrained Model |
|
--seqs SEQS Protein sequences in .fasta format |
|
--embeddings EMBEDDINGS |
|
h5 file with embedded sequences |
|
-o OUTFILE, --outfile OUTFILE |
|
File for predictions |
|
-d DEVICE, --device DEVICE |
|
Compute device to use |
|
--thresh THRESH Positive prediction threshold - used to store contact |
|
maps and predictions in a separate file. [default: |
|
0.5] |
|
|
|
Embedding |
|
~~~~~~~~~ |
|
|
|
.. code-block:: bash |
|
|
|
usage: dscript embed [-h] --seqs SEQS --outfile OUTFILE [-d DEVICE] |
|
|
|
Generate new embeddings using pre-trained language model |
|
|
|
optional arguments: |
|
-h, --help show this help message and exit |
|
--seqs SEQS Sequences to be embedded |
|
--outfile OUTFILE h5 file to write results |
|
-d DEVICE, --device DEVICE |
|
Compute device to use |
|
|
|
Training |
|
~~~~~~~~ |
|
|
|
.. code-block:: bash |
|
|
|
usage: dscript train [-h] --train TRAIN --val VAL --embedding EMBEDDING |
|
[--augment] [--projection-dim PROJECTION_DIM] |
|
[--dropout-p DROPOUT_P] [--hidden-dim HIDDEN_DIM] |
|
[--kernel-width KERNEL_WIDTH] [--use-w] |
|
[--pool-width POOL_WIDTH] |
|
[--negative-ratio NEGATIVE_RATIO] |
|
[--epoch-scale EPOCH_SCALE] [--num-epochs NUM_EPOCHS] |
|
[--batch-size BATCH_SIZE] [--weight-decay WEIGHT_DECAY] |
|
[--lr LR] [--lambda LAMBDA_] [-o OUTFILE] |
|
[--save-prefix SAVE_PREFIX] [-d DEVICE] |
|
[--checkpoint CHECKPOINT] |
|
|
|
Train a new model |
|
|
|
optional arguments: |
|
-h, --help show this help message and exit |
|
|
|
Data: |
|
--train TRAIN Training data |
|
--val VAL Validation data |
|
--embedding EMBEDDING |
|
h5 file with embedded sequences |
|
--augment Set flag to augment data by adding (B A) for all pairs |
|
(A B) |
|
|
|
Projection Module: |
|
--projection-dim PROJECTION_DIM |
|
Dimension of embedding projection layer (default: 100) |
|
--dropout-p DROPOUT_P |
|
Parameter p for embedding dropout layer (default: 0.5) |
|
|
|
Contact Module: |
|
--hidden-dim HIDDEN_DIM |
|
Number of hidden units for comparison layer in contact |
|
prediction (default: 50) |
|
--kernel-width KERNEL_WIDTH |
|
Width of convolutional filter for contact prediction |
|
(default: 7) |
|
|
|
Interaction Module: |
|
--use-w Use weight matrix in interaction prediction model |
|
--pool-width POOL_WIDTH |
|
Size of max-pool in interaction model (default: 9) |
|
|
|
Training: |
|
--negative-ratio NEGATIVE_RATIO |
|
Number of negative training samples for each positive |
|
training sample (default: 10) |
|
--epoch-scale EPOCH_SCALE |
|
Report heldout performance every this many epochs |
|
(default: 5) |
|
--num-epochs NUM_EPOCHS |
|
Number of epochs (default: 100) |
|
--batch-size BATCH_SIZE |
|
Minibatch size (default: 25) |
|
--weight-decay WEIGHT_DECAY |
|
L2 regularization (default: 0) |
|
--lr LR Learning rate (default: 0.001) |
|
--lambda LAMBDA_ Weight on the similarity objective (default: 0.35) |
|
|
|
Output and Device: |
|
-o OUTPUT, --output OUTPUT |
|
Output file path (default: stdout) |
|
--save-prefix SAVE_PREFIX |
|
Path prefix for saving models |
|
-d DEVICE, --device DEVICE |
|
Compute device to use |
|
--checkpoint CHECKPOINT |
|
Checkpoint model to start training from`` |
|
|
|
Evaluation |
|
~~~~~~~~~~ |
|
|
|
.. code-block:: bash |
|
|
|
usage: dscript eval [-h] --model MODEL --test TEST --embedding EMBEDDING |
|
[-o OUTFILE] [-d DEVICE] |
|
|
|
Evaluate a trained model |
|
|
|
optional arguments: |
|
-h, --help show this help message and exit |
|
--model MODEL Trained prediction model |
|
--test TEST Test Data |
|
--embedding EMBEDDING |
|
h5 file with embedded sequences |
|
-o OUTFILE, --outfile OUTFILE |
|
Output file to write results |
|
-d DEVICE, --device DEVICE |
|
Compute device to use |
|
|