Spaces:
Running
Running
| ## Setup | |
| No setup is required. Simply fill in the input boxes with the necessary data and click the **Run** button. | |
| You can find a list of examples at the bottom of the page; clicking on them will autofill the fields for you. | |
| If the server remains idle for a period, it will enter standby mode. Running a calculation will wake the tool from standby, but note that the first run may take longer due to startup and model loading. | |
| ## Input | |
| **Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box. | |
| Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualized. | |
| **Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input: | |
| - **Single Substitution**: Input one or more substitutions (e.g. `R218K R218W`) to score specific changes. | |
| - **Residue Position**: Provide residue positions to evaluate all possible substitutions at those sites. | |
| - **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length. | |
| - **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed. | |
| **Model Selection**: Choose a model for calculations from those available on Hugging Face Model Hub. | |
| The `esm2_t33_650M_UR50D` model offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574). | |
| **Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference. | |
| While this method is slower, it enhances accuracy. | |
| If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy. | |
| **Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns, especially with longer sequences or during peak server usage times. | |
| For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes. | |
| Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritize reducing model size when optimizing for runtime. | |
| The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length. | |
| **Concurrent Substitutions**: | |
| To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation. | |
| Accuracy is not guaranteed as this use case is yet untested. | |
| ## Output | |
| Results are displayed in a colour-coded table, except for deep mutational scans, which produce a heatmap. | |
| In the table: | |
| - Beneficial substitutions are highlighted in green with positive values. | |
| - Detrimental substitutions appear in red with negative values. | |
| As a rule of thumb, score differences of *4* or more are considered significant. For instance: | |
| - A substitution scoring *-6* is likely detrimental to protein functionality. | |
| - A score of *+2* is generally regarded as neutral. | |
| The **Download raw data** button lets you download the output in CSV format. | |
| **If you use this tool in your research, please cite**: | |
| Totaro MG, Vide U, Zausinger R, Winkler A, Oberdorfer G. ESM-scan—A tool to guide amino acid substitutions. *Protein Science.* 2024; 33(12):e5221. [doi.org/10.1002/pro.5221](https://doi.org/10.1002/pro.5221) | |