jwu418
commited on
Commit
·
7d1e291
1
Parent(s):
936add1
upload
Browse files- .gitattributes +2 -0
- README.md +27 -0
- assets/batch_correlation.png +3 -0
- assets/overview.png +3 -0
- pretrain.pth +3 -0
- vocab/atac_vocab.json +3 -0
- vocab/batch_bmmc.json +3 -0
- vocab/batch_full.json +3 -0
- vocab/batch_kidney.json +3 -0
- vocab/batch_pbmc.json +3 -0
- vocab/cell_bmmc.json +3 -0
- vocab/cell_full.json +3 -0
- vocab/cell_kidney.json +3 -0
- vocab/cell_pbmc.json +3 -0
- vocab/chr_vocab.json +3 -0
- vocab/gene2chr.json +3 -0
- vocab/rna_vocab.json +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,30 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
# EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment
|
6 |
+
|
7 |
+
This repo contains pre-trained weights of **EpiFoundation** in our paper: [EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment]()
|
8 |
+
|
9 |
+

|
10 |
+
|
11 |
+
## Introduction
|
12 |
+
|
13 |
+
Foundation models exhibit strong capabilities for downstream tasks by learning generalized representations through self-supervised pre-training on large datasets. While several foundation models have been developed for single-cell RNA-seq (scRNA-seq) data, there is still a lack of models specifically tailored for single-cell ATAC-seq (scATAC-seq), which measures epigenetic information in individual cells. The principal challenge in developing such a model lies in the vast number of scATAC peaks and the significant sparsity of the data, which complicates the formulation of peak-to-peak correlations. To address this challenge, we introduce **EpiFoundation**, a foundation model for learning cell representations from the high-dimensional and sparse space of peaks. EpiFoundation relies on an innovative cross-modality pre-training procedure with two key technical innovations. First, EpiFoundation exclusively processes the non-zero peak set, thereby enhancing the density of cell-specific information within the input data. Second, EpiFoundation utilizes dense gene expression information to supervise the pre-training process, aligning peak-to-gene correlations. EpiFoundation can handle various types of downstream tasks, including cell-type annotation, batch correction, and gene expression prediction. To train and validate EpiFoundation, we curated **MiniAtlas**, a dataset of 100,000+ single cells with paired scRNA-seq and scATAC-seq data, along with diverse test sets spanning various tissues and cell types for robust evaluation. EpiFoundation demonstrates **state-of-the-art performance across multiple tissues and diverse downstream tasks**.
|
14 |
+
|
15 |
+

|
16 |
+
|
17 |
+
|
18 |
+
## Acknowledgment
|
19 |
+
|
20 |
+
We would like to thank the TPU Research Cloud (TRC) program and the Google Cloud Research Credits program for Research program for supporting our computing needs. W.H. and Z.J. are supported by the National Institute Of General Medical Sciences of the National Institutes of Health (NIH), under Award Number R35GM150887 and R35GM154865 respectively.
|
21 |
+
|
22 |
+
## Citation
|
23 |
+
|
24 |
+
```
|
25 |
+
|
26 |
+
```
|
27 |
+
|
28 |
+
## Contact
|
29 |
+
|
30 |
+
If you have any questions, please feel free to raise an issue or contact us directly: Juncheng Wu ([email protected])
|
assets/batch_correlation.png
ADDED
![]() |
Git LFS Details
|
assets/overview.png
ADDED
![]() |
Git LFS Details
|
pretrain.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2da08ff4572738d90fd70d38d792850821bb21c2cf52838286537542b991ab59
|
3 |
+
size 3606582266
|
vocab/atac_vocab.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:54218ddf3df9ade25bb2edde3aed1eb235db0df08d86593a8caccbf464740f3b
|
3 |
+
size 19237573
|
vocab/batch_bmmc.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eeb41579c46e26c8bbef4eb31faff09e1b5530b970bd6b8a3368442e31b062f1
|
3 |
+
size 388
|
vocab/batch_full.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9f5eecb6a3e0f7c48b5aa54beac614832336b9bdc777fb6a82d33d45c27b1a77
|
3 |
+
size 511
|
vocab/batch_kidney.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1b377176ea3b2d8949366cd941e8b1097d889f55de390bd3012b5fa8a8cb612b
|
3 |
+
size 162
|
vocab/batch_pbmc.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f395b1bab1a099379bd8d79f37b43a76704a35f7ab96fc1a247b40776d97b2b3
|
3 |
+
size 290
|
vocab/cell_bmmc.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:18652184381adea79036bc16eb360c03d228c0f1c24a1f2c75920727cea20dfe
|
3 |
+
size 245
|
vocab/cell_full.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c80936d3a9c2857cdb8bfec91a32a0feed738bcc2dad1f7184f270236cd64728
|
3 |
+
size 1436
|
vocab/cell_kidney.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a9ebfe97971b55262b88e266afd3a0304a91650d4dc214bf9e99e2f55e3edaed
|
3 |
+
size 181
|
vocab/cell_pbmc.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5fe05f0fb19dec827c8f194c00e48da08e667053e5b4b0b4970a43ffe92a7636
|
3 |
+
size 160
|
vocab/chr_vocab.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f684e54533b6655884018150fe1d37bfe719cde96997daf09fbd5278e8f9bf2b
|
3 |
+
size 716
|
vocab/gene2chr.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:12d275fc8f1f446092ad90f3e6041ee771beb10c79d371c0468faec8666d23ab
|
3 |
+
size 728529
|
vocab/rna_vocab.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e010c4379db6ceede6e23d72c5b351502a3470728df9740f833003f3aeaea00
|
3 |
+
size 736136
|