jwu418 commited on
Commit
7d1e291
·
1 Parent(s): 936add1
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
37
+ *.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,30 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment
6
+
7
+ This repo contains pre-trained weights of **EpiFoundation** in our paper: [EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment]()
8
+
9
+ ![image-20250204132627694](./assets/framework.png)
10
+
11
+ ## Introduction
12
+
13
+ Foundation models exhibit strong capabilities for downstream tasks by learning generalized representations through self-supervised pre-training on large datasets. While several foundation models have been developed for single-cell RNA-seq (scRNA-seq) data, there is still a lack of models specifically tailored for single-cell ATAC-seq (scATAC-seq), which measures epigenetic information in individual cells. The principal challenge in developing such a model lies in the vast number of scATAC peaks and the significant sparsity of the data, which complicates the formulation of peak-to-peak correlations. To address this challenge, we introduce **EpiFoundation**, a foundation model for learning cell representations from the high-dimensional and sparse space of peaks. EpiFoundation relies on an innovative cross-modality pre-training procedure with two key technical innovations. First, EpiFoundation exclusively processes the non-zero peak set, thereby enhancing the density of cell-specific information within the input data. Second, EpiFoundation utilizes dense gene expression information to supervise the pre-training process, aligning peak-to-gene correlations. EpiFoundation can handle various types of downstream tasks, including cell-type annotation, batch correction, and gene expression prediction. To train and validate EpiFoundation, we curated **MiniAtlas**, a dataset of 100,000+ single cells with paired scRNA-seq and scATAC-seq data, along with diverse test sets spanning various tissues and cell types for robust evaluation. EpiFoundation demonstrates **state-of-the-art performance across multiple tissues and diverse downstream tasks**.
14
+
15
+ ![image-20250204133126719](./assets/batch_correlation.png)
16
+
17
+
18
+ ## Acknowledgment
19
+
20
+ We would like to thank the TPU Research Cloud (TRC) program and the Google Cloud Research Credits program for Research program for supporting our computing needs. W.H. and Z.J. are supported by the National Institute Of General Medical Sciences of the National Institutes of Health (NIH), under Award Number R35GM150887 and R35GM154865 respectively.
21
+
22
+ ## Citation
23
+
24
+ ```
25
+
26
+ ```
27
+
28
+ ## Contact
29
+
30
+ If you have any questions, please feel free to raise an issue or contact us directly: Juncheng Wu ([email protected])
assets/batch_correlation.png ADDED

Git LFS Details

  • SHA256: cd6adada1b04727c979f126298cc2e5768a512cb6481423c4a8294acec84f07b
  • Pointer size: 132 Bytes
  • Size of remote file: 1.92 MB
assets/overview.png ADDED

Git LFS Details

  • SHA256: d9bd17f26f2568a32f638b9289e4116d27d639a2330167d164b3e9274d2b6ee7
  • Pointer size: 132 Bytes
  • Size of remote file: 1.25 MB
pretrain.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2da08ff4572738d90fd70d38d792850821bb21c2cf52838286537542b991ab59
3
+ size 3606582266
vocab/atac_vocab.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54218ddf3df9ade25bb2edde3aed1eb235db0df08d86593a8caccbf464740f3b
3
+ size 19237573
vocab/batch_bmmc.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eeb41579c46e26c8bbef4eb31faff09e1b5530b970bd6b8a3368442e31b062f1
3
+ size 388
vocab/batch_full.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f5eecb6a3e0f7c48b5aa54beac614832336b9bdc777fb6a82d33d45c27b1a77
3
+ size 511
vocab/batch_kidney.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b377176ea3b2d8949366cd941e8b1097d889f55de390bd3012b5fa8a8cb612b
3
+ size 162
vocab/batch_pbmc.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f395b1bab1a099379bd8d79f37b43a76704a35f7ab96fc1a247b40776d97b2b3
3
+ size 290
vocab/cell_bmmc.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18652184381adea79036bc16eb360c03d228c0f1c24a1f2c75920727cea20dfe
3
+ size 245
vocab/cell_full.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c80936d3a9c2857cdb8bfec91a32a0feed738bcc2dad1f7184f270236cd64728
3
+ size 1436
vocab/cell_kidney.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9ebfe97971b55262b88e266afd3a0304a91650d4dc214bf9e99e2f55e3edaed
3
+ size 181
vocab/cell_pbmc.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5fe05f0fb19dec827c8f194c00e48da08e667053e5b4b0b4970a43ffe92a7636
3
+ size 160
vocab/chr_vocab.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f684e54533b6655884018150fe1d37bfe719cde96997daf09fbd5278e8f9bf2b
3
+ size 716
vocab/gene2chr.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12d275fc8f1f446092ad90f3e6041ee771beb10c79d371c0468faec8666d23ab
3
+ size 728529
vocab/rna_vocab.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e010c4379db6ceede6e23d72c5b351502a3470728df9740f833003f3aeaea00
3
+ size 736136