Christina Theodoris
commited on
Commit
•
d468697
1
Parent(s):
e606e1c
Update instructions in tokenizing example to clarify requirement for Ensembl ID
Browse files
examples/tokenizing_scRNAseq_data.ipynb
CHANGED
@@ -15,7 +15,9 @@
|
|
15 |
"id": "350e6252-b783-494b-9767-f087eb868a15",
|
16 |
"metadata": {},
|
17 |
"source": [
|
18 |
-
"#### Input data is a directory with .loom files containing raw counts from single cell RNAseq data, including all genes detected in the transcriptome without feature selection.
|
|
|
|
|
19 |
"\n",
|
20 |
"#### No cell metadata is required, but custom cell attributes may be passed onto the tokenized dataset by providing a dictionary of custom attributes to be added, which is formatted as loom_col_attr_name : desired_dataset_col_attr_name. For example, if the original .loom dataset has column attributes \"cell_type\" and \"organ_major\" and one would like to retain these attributes as labels in the tokenized dataset with the new names \"cell_type\" and \"organ\", respectively, the following custom attribute dictionary should be provided: {\"cell_type\": \"cell_type\", \"organ_major\": \"organ\"}. \n",
|
21 |
"\n",
|
|
|
15 |
"id": "350e6252-b783-494b-9767-f087eb868a15",
|
16 |
"metadata": {},
|
17 |
"source": [
|
18 |
+
"#### Input data is a directory with .loom files containing raw counts from single cell RNAseq data, including all genes detected in the transcriptome without feature selection. \n",
|
19 |
+
"\n",
|
20 |
+
"#### Genes should be labeled with Ensembl IDs (row attribute \"ensembl_id\"), which provide a unique identifer for conversion to tokens.\n",
|
21 |
"\n",
|
22 |
"#### No cell metadata is required, but custom cell attributes may be passed onto the tokenized dataset by providing a dictionary of custom attributes to be added, which is formatted as loom_col_attr_name : desired_dataset_col_attr_name. For example, if the original .loom dataset has column attributes \"cell_type\" and \"organ_major\" and one would like to retain these attributes as labels in the tokenized dataset with the new names \"cell_type\" and \"organ\", respectively, the following custom attribute dictionary should be provided: {\"cell_type\": \"cell_type\", \"organ_major\": \"organ\"}. \n",
|
23 |
"\n",
|