Ningyu commited on
Commit
ee68771
·
verified ·
1 Parent(s): 43ec427

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -45
README.md CHANGED
@@ -17,10 +17,9 @@ tags:
17
  <h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
18
 
19
  <p align="center">
20
- <a href="https://www.zjukg.org/project/ChatCell">💻 Project Page</a> •
21
  <a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">🤗 Dataset</a> •
22
  <a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
23
- <a href="https://arxiv.org/abs/2402.08303">📑 Paper</a> •
24
  <a href="#1">🏖️ Overview</a> •
25
  <a href="#2">🧬 Single-cell Analysis Tasks</a> •
26
  <a href="#3">🛠️ Quickstart</a> •
@@ -36,37 +35,34 @@ tags:
36
 
37
  ## 📌 Table of Contents
38
 
39
- - [🏖️ Overview](#1)
40
- - [🧬 Single-cell Analysis Tasks](#2)
41
- - [🛠️ Quickstart](#3)
42
- - [📝 Cite](#4)
43
-
44
 
45
  ---
46
 
47
- <h2 id="1">🏖️ Overview</h2>
48
 
49
- **Background**
50
- - Single-cell biology examines the intricate functions of the cells, ranging from energy production to genetic information transfer, playing a critical role in unraveling the fundamental principles of life and mechanisms influencing health and disease.
51
- - The field has witnessed a surge in single-cell RNA sequencing (scRNA-seq) data, driven by advancements in high-throughput sequencing and reduced costs.
52
- - Traditional single-cell foundation models leverage extensive scRNA-seq datasets, applying NLP techniques to analyze gene expression matrices—structured formats that simplify scRNA-seq data into computationally tractable representations—during pre-training. They are subsequently fine-tuned for distinct single-cell analysis tasks, as shown in Figure (a).
53
 
54
- <p align="center">
55
- <img src="./figures/overview.jpg" alt="image" width=100%>
56
- </p>
57
- <div align="center">
58
- Figure 1: (a) Comparison of traditional single-cell engineering and <b>ChatCell</b>. (b) Overview of <b>ChatCell</b>.
59
- </div>
60
- <br>
61
- We present <b>ChatCell</b>, a new paradigm that leverages natural language to make single-cell analysis more accessible and intuitive.
62
 
63
- - Initially, we convert scRNA-seq data into a single-cell language that LLMs can readily interpret.
64
- - Subsequently, we employ templates to integrate this single-cell language with task descriptions and target outcomes, creating comprehensive single-cell instructions.
65
- - To improve the LLM's expertise in the single-cell domain, we conduct vocabulary adaptation, enriching the model with a specialized single-cell lexicon.
66
- - Following this, we utilize unified sequence generation to empower the model to adeptly execute a range of single-cell tasks.
 
 
 
67
 
 
 
 
 
68
 
69
- <h2 id="2">🧬 Single-cell Analysis Tasks</h2>
 
70
 
71
  We concentrate on the following single-cell tasks:
72
 
@@ -101,34 +97,18 @@ The drug sensitivity prediction task aims to predict the response of different c
101
  <img src="./figures/example4.jpg" alt="image" width=80%>
102
  </p>
103
 
104
- <h2 id="3">🛠️ Quickstart</h2>
105
-
106
- ```python
107
- from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
108
-
109
- tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-base")
110
- model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-base")
111
- input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
112
-
113
- # Encode the input text and generate a response with specified generation parameters
114
- input_ids = tokenizer(input_text,return_tensors="pt").input_ids
115
- output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
116
-
117
- # Decode and print the generated output text
118
- output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
119
- print(output_text)
120
- ```
121
 
 
 
122
 
123
 
124
- <h2 id="4">📝 Cite</h2>
125
 
126
- If you use our repository, please cite the following related paper:
127
  ```
128
  @article{fang2024chatcell,
129
  title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
130
  author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
131
- journal={arXiv preprint arXiv:2402.08303},
132
  year={2024},
133
  }
134
  ```
 
17
  <h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
18
 
19
  <p align="center">
20
+ <a href="https://chat.openai.com/g/g-vUwj222gQ-chatcell">💻GPTStore App</a> •
21
  <a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">🤗 Dataset</a> •
22
  <a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
 
23
  <a href="#1">🏖️ Overview</a> •
24
  <a href="#2">🧬 Single-cell Analysis Tasks</a> •
25
  <a href="#3">🛠️ Quickstart</a> •
 
35
 
36
  ## 📌 Table of Contents
37
 
38
+ - [🛠️ Quickstart](#2)
39
+ - [🧬 Single-cell Analysis Tasks](#3)
40
+ - [ Acknowledgements](#4)
41
+ - [📝 Cite](#5)
 
42
 
43
  ---
44
 
 
45
 
46
+ <h2 id="3">🛠️ Quickstart</h2>
 
 
 
47
 
48
+ ```python
49
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 
 
 
 
 
 
50
 
51
+ tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-base")
52
+ model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-base")
53
+ input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
54
+
55
+ # Encode the input text and generate a response with specified generation parameters
56
+ input_ids = tokenizer(input_text,return_tensors="pt").input_ids
57
+ output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
58
 
59
+ # Decode and print the generated output text
60
+ output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
61
+ print(output_text)
62
+ ```
63
 
64
+
65
+ <h2 id="3">🧬 Single-cell Analysis Tasks</h2>
66
 
67
  We concentrate on the following single-cell tasks:
68
 
 
97
  <img src="./figures/example4.jpg" alt="image" width=80%>
98
  </p>
99
 
100
+ <h2 id="4">📝 ✨ Acknowledgements</h2>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
+ Special thanks to the authors of [Cell2Sentence: Teaching Large Language Models the Language of Biology](https://github.com/vandijklab/cell2sentence-ft) and [Representing cells as sentences enables natural-language processing for single-cell transcriptomics
103
+ ](https://github.com/rahuldhodapkar/cell2sentence) for their inspiring work.
104
 
105
 
106
+ <h2 id="5">📝 Cite</h2>
107
 
 
108
  ```
109
  @article{fang2024chatcell,
110
  title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
111
  author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
 
112
  year={2024},
113
  }
114
  ```