File size: 4,500 Bytes
cca7fbc
5b55003
 
 
 
 
 
cca7fbc
5b55003
 
 
 
f15df98
5b55003
 
 
 
 
 
 
f15df98
5b55003
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f15df98
 
 
 
5b55003
 
 
f15df98
5b55003
f15df98
 
5b55003
f15df98
 
 
 
 
 
 
 
 
 
5b55003
 
 
f15df98
5b55003
642027d
5b55003
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f15df98
5b55003
f15df98
 
5b55003
 
f15df98
5b55003
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
pipeline_tag: text-generation
tags:
- biology
- single-cell
- single-cell analysis
- text-generation-inference
---





<div align="center">

<img src="./figures/logo.png" alt="image" width=8%>

<h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>

<p align="center">
  <a href="https://chat.openai.com/g/g-vUwj222gQ-chatcell">πŸ’»GPTStore App</a> β€’
  <a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">πŸ€— Dataset</a> β€’
  <a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> β€’
  <a href="#1">πŸ–οΈ Overview</a> β€’
  <a href="#2">🧬 Single-cell Analysis Tasks</a> β€’
  <a href="#3">πŸ› οΈ Quickstart</a> β€’
  <a href="#4">πŸ“ Cite</a>
</p>


<img src="./figures/intro.jpg" alt="image" width=60%>
<b>ChatCell</b> allows researchers to input instructions in either natural or single-cell language, thereby facilitating the execution of necessary tasks in single-cell analysis. Black and red texts denote human and single-cell language, respectively.

</div>


## πŸ“Œ Table of Contents

- [πŸ› οΈ Quickstart](#2)
- [🧬 Single-cell Analysis Tasks](#3)
- [✨ Acknowledgements](#4)
- [πŸ“ Cite](#5)

---

<h2 id="3">πŸ› οΈ Quickstart</h2>

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-large")
model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-large")
input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
# Encode the input text and generate a response with specified generation parameters
input_ids = tokenizer(input_text,return_tensors="pt").input_ids
output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
# Decode and print the generated output text
output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
print(output_text)
```



<h2 id="3">🧬 Single-cell Analysis Tasks</h2>

ChatCell can handle the following single-cell tasks:

- <b>Random Cell Sentence Generation.</b>
Random cell sentence generation challenges the model to create cell sentences devoid of predefined biological conditions or constraints. This task aims to evaluate the model's ability to generate valid and contextually appropriate cell sentences, potentially simulating natural variations in cellular behavior. 

<p align="center">
<img src="./figures/example1.jpg" alt="image" width=80%>
</p>


- <b>Pseudo-cell Generation.</b>
Pseudo-cell generation focuses on generating gene sequences tailored to specific cell type labels. This task is vital for unraveling gene expression and regulation across different cell types, offering insights for medical research and disease studies, particularly in the context of diseased cell types.


<p align="center">
<img src="./figures/example2.jpg" alt="image" width=80%>
</p>

- <b>Cell Type Annotation.</b>
For cell type annotation, the model is tasked with precisely classifying cells into their respective types based on gene expression patterns encapsulated in cell sentences. This task is fundamental for understanding cellular functions and interactions within tissues and organs, playing a crucial role in developmental biology and regenerative medicine.

<p align="center">
<img src="./figures/example3.jpg" alt="image" width=80%>
</p>

- <b>Drug Sensitivity Prediction.</b>
The drug sensitivity prediction task aims to predict the response of different cells to various drugs. It is pivotal in designing effective, personalized treatment plans and contributes significantly to drug development, especially in optimizing drug efficacy and safety.


<p align="center">
<img src="./figures/example4.jpg" alt="image" width=80%>
</p>

<h2 id="4">πŸ“ ✨ Acknowledgements</h2>

Special thanks to the authors of [Cell2Sentence: Teaching Large Language Models the Language of Biology](https://github.com/vandijklab/cell2sentence-ft) and [Representing cells as sentences enables natural-language processing for single-cell transcriptomics
](https://github.com/rahuldhodapkar/cell2sentence) for their inspiring work.


<h2 id="5">πŸ“ Cite</h2>

```
@article{fang2024chatcell,
  title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
  author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
  year={2024},
}
```