Spaces:
Paused
Paused
A newer version of the Gradio SDK is available:
5.16.0
metadata
license: mit
This is a refined version of a dataset obtained from UniProt (see here).
The data was first sorted by family, then random families were selected until approximately 20% of the data was separates out for test data.
Next, each sequences longer than 1000 residues was segmented into non-overlapping sections of 1000 amino acids or less. Any sequences
with only partial binding site annotations were thrown out (any sequences with <
, >
, or ?
).
Note: Copied from https://huggingface.co/datasets/AmelieSchreiber/binding_sites_random_split_by_family