Geneformer / geneformer /tokenizer.py

Commit History

Addressed issues for tokenizer, anndata tokenizer now uses a fraction of memory
b24676d

ricomnl commited on

Generalized
5cb733f

ricomnl commited on

Added anndata tokenizer and switched to Dataset.from_generator
b6ca566

ricomnl commited on

Add error for no files found and suppress loompy import warning
abdf980

Christina Theodoris commited on

Update tokenizer to allow tokenization without custom cell attributes
57b9778

Christina Theodoris commited on

Modify tokenizer to allow renaming attr names btwn loom and .dataset
e78c44d

Christina Theodoris commited on

Add further explanation regarding input file format for transcriptome tokenizer
c34ead6

Christina Theodoris commited on

Add further explanation to tokenizer example script and updated tokenizer to match loompy raised error
78dd83b

Christina Theodoris commited on

Fix bug with metadata when processing multiple .loom files (#3)
044d737

ctheodoris davidjwen commited on

Add data collator for cell classification and example for cell classification
088ea6e

Christina Theodoris commited on

Add Geneformer tokenizer and updated model card
5426788

Christina Theodoris commited on