ESM2Bind / datasets /README.md
wangjin2000's picture
Upload 5 files
5c47379 verified

A newer version of the Gradio SDK is available: 5.16.0

Upgrade
metadata
license: mit

This is a refined version of a dataset obtained from UniProt (see here). The data was first sorted by family, then random families were selected until approximately 20% of the data was separates out for test data. Next, each sequences longer than 1000 residues was segmented into non-overlapping sections of 1000 amino acids or less. Any sequences with only partial binding site annotations were thrown out (any sequences with <, >, or ?).

Note: Copied from https://huggingface.co/datasets/AmelieSchreiber/binding_sites_random_split_by_family