Spaces:

wilbin
/

DSCRIPT

Runtime error

App Files Files Community

DSCRIPT / samsledje-D-SCRIPT-8a55490 /docs /source /usage.rst

wilbin

Upload 248 files

8896a5f verified 12 months ago

raw

history blame contribute delete

6.51 kB

	Usage
	=====

	Quick Start
	~~~~~~~~~~~

	Predict a new network using a trained model
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Pre-trained models can be downloaded from [TBD].
	Candidate pairs should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2].
	Optionally, a third column with [label] can be provided, so predictions can be made using training or test data files (but the label will not affect the predictions).

	.. code-block:: bash

	dscript predict --pairs [input data] --seqs [sequences, .fasta format] --model [model file]

	Embed sequences with language model
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Sequences should be in ``.fasta`` format.

	.. code-block:: bash

	dscript embed --seqs [sequences] --outfile [embedding file]

	Train and save a model
	^^^^^^^^^^^^^^^^^^^^^^

	Training and validation data should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2], [label].

	.. code-block:: bash

	dscript train --train [training data] --val [validation data] --embedding [embedding file] --save-prefix [prefix]


	Evaluate a trained model
	^^^^^^^^^^^^^^^^^^^^^^^^

	.. code-block:: bash

	dscript eval --model [model file] --test [test data] --embedding [embedding file] --outfile [result file]


	Prediction
	~~~~~~~~~~

	.. code-block:: bash

	usage: dscript predict [-h] --pairs PAIRS --model MODEL [--seqs SEQS]
	[--embeddings EMBEDDINGS] [-o OUTFILE] [-d DEVICE]
	[--thresh THRESH]

	Make new predictions with a pre-trained model. One of --seqs and --embeddings is required.

	optional arguments:
	-h, --help show this help message and exit
	--pairs PAIRS Candidate protein pairs to predict
	--model MODEL Pretrained Model
	--seqs SEQS Protein sequences in .fasta format
	--embeddings EMBEDDINGS
	h5 file with embedded sequences
	-o OUTFILE, --outfile OUTFILE
	File for predictions
	-d DEVICE, --device DEVICE
	Compute device to use
	--thresh THRESH Positive prediction threshold - used to store contact
	maps and predictions in a separate file. [default:
	0.5]

	Embedding
	~~~~~~~~~

	.. code-block:: bash

	usage: dscript embed [-h] --seqs SEQS --outfile OUTFILE [-d DEVICE]

	Generate new embeddings using pre-trained language model

	optional arguments:
	-h, --help show this help message and exit
	--seqs SEQS Sequences to be embedded
	--outfile OUTFILE h5 file to write results
	-d DEVICE, --device DEVICE
	Compute device to use

	Training
	~~~~~~~~

	.. code-block:: bash

	usage: dscript train [-h] --train TRAIN --val VAL --embedding EMBEDDING
	[--augment] [--projection-dim PROJECTION_DIM]
	[--dropout-p DROPOUT_P] [--hidden-dim HIDDEN_DIM]
	[--kernel-width KERNEL_WIDTH] [--use-w]
	[--pool-width POOL_WIDTH]
	[--negative-ratio NEGATIVE_RATIO]
	[--epoch-scale EPOCH_SCALE] [--num-epochs NUM_EPOCHS]
	[--batch-size BATCH_SIZE] [--weight-decay WEIGHT_DECAY]
	[--lr LR] [--lambda LAMBDA_] [-o OUTFILE]
	[--save-prefix SAVE_PREFIX] [-d DEVICE]
	[--checkpoint CHECKPOINT]

	Train a new model

	optional arguments:
	-h, --help show this help message and exit

	Data:
	--train TRAIN Training data
	--val VAL Validation data
	--embedding EMBEDDING
	h5 file with embedded sequences
	--augment Set flag to augment data by adding (B A) for all pairs
	(A B)

	Projection Module:
	--projection-dim PROJECTION_DIM
	Dimension of embedding projection layer (default: 100)
	--dropout-p DROPOUT_P
	Parameter p for embedding dropout layer (default: 0.5)

	Contact Module:
	--hidden-dim HIDDEN_DIM
	Number of hidden units for comparison layer in contact
	prediction (default: 50)
	--kernel-width KERNEL_WIDTH
	Width of convolutional filter for contact prediction
	(default: 7)

	Interaction Module:
	--use-w Use weight matrix in interaction prediction model
	--pool-width POOL_WIDTH
	Size of max-pool in interaction model (default: 9)

	Training:
	--negative-ratio NEGATIVE_RATIO
	Number of negative training samples for each positive
	training sample (default: 10)
	--epoch-scale EPOCH_SCALE
	Report heldout performance every this many epochs
	(default: 5)
	--num-epochs NUM_EPOCHS
	Number of epochs (default: 100)
	--batch-size BATCH_SIZE
	Minibatch size (default: 25)
	--weight-decay WEIGHT_DECAY
	L2 regularization (default: 0)
	--lr LR Learning rate (default: 0.001)
	--lambda LAMBDA_ Weight on the similarity objective (default: 0.35)

	Output and Device:
	-o OUTPUT, --output OUTPUT
	Output file path (default: stdout)
	--save-prefix SAVE_PREFIX
	Path prefix for saving models
	-d DEVICE, --device DEVICE
	Compute device to use
	--checkpoint CHECKPOINT
	Checkpoint model to start training from``

	Evaluation
	~~~~~~~~~~

	.. code-block:: bash

	usage: dscript eval [-h] --model MODEL --test TEST --embedding EMBEDDING
	[-o OUTFILE] [-d DEVICE]

	Evaluate a trained model

	optional arguments:
	-h, --help show this help message and exit
	--model MODEL Trained prediction model
	--test TEST Test Data
	--embedding EMBEDDING
	h5 file with embedded sequences
	-o OUTFILE, --outfile OUTFILE
	Output file to write results
	-d DEVICE, --device DEVICE
	Compute device to use