File size: 15,945 Bytes
e2f2226
 
 
 
 
 
 
 
 
c1ab110
 
 
 
 
 
 
75243fa
c1ab110
 
 
75243fa
c1ab110
 
75243fa
 
0751062
c1ab110
 
75243fa
c1ab110
 
 
75243fa
c1ab110
 
 
75243fa
c1ab110
 
75243fa
 
 
c1ab110
 
 
 
 
 
 
 
 
75243fa
c1ab110
 
74c04ce
2431b1f
74c04ce
316b495
 
 
 
 
 
2431b1f
 
 
74c04ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2431b1f
 
 
74c04ce
2431b1f
 
 
74c04ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2431b1f
 
74c04ce
2431b1f
 
74c04ce
2431b1f
 
 
74c04ce
2431b1f
74c04ce
 
2431b1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74c04ce
2431b1f
c1ab110
75243fa
85b5ad0
 
 
c1ab110
 
75243fa
5da0640
c1ab110
 
75243fa
c1ab110
 
 
 
 
75243fa
c1ab110
 
 
75243fa
c1ab110
 
 
 
 
 
ba919bf
c1ab110
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
721ca55
c1ab110
 
 
 
 
321797e
 
 
 
 
 
 
 
 
 
c1ab110
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
license: apache-2.0
language:
- zh
- en
base_model:
- Qwen/Qwen2.5-7B
---

# TableGPT2-7B

## Model details

We developed and released TableGPT2-7B, a large-scale decoder specifically tailored for data-intensive tasks, with a focus on interpreting and analyzing tabular data. TableGPT2-7B is designed to bridge the gap between conventional LLM capabilities and the real-world demands of tabular/structured data tasks, such as those in business intelligence (BI), automated data-driven analysis, and application tasks tightly involving databases or data warehouses.

**Model Developers**  

Zhejiang University

**Variations**  

TableGPT2 is available in two configurations—7B and 72B parameters—both derived from the Qwen2.5 model family and optimized for handling structured data in tabular formats. Currently, we have released the 7B version to the public.

**Input**

TableGPT2-7B accepts both text and tabular data as input, with the tabular data structured as text in the format of a df.head() result.

**Output** 

TableGPT2-7B produces text-based outputs, specifically optimized for coding tasks, data interpretation, and BI-focused question answering.

**Language**  

Our model places a strong emphasis on Chinese corpora, and currently, queries in other languages may have limited support.

**Other Requirements** 

We highly recommend exploring [our repository on GitHub](https://github.com/tablegpt/tablegpt-agent), where users can integrate this model into our agent workflow for enhanced performance.

**Model Architecture** 

TableGPT2-7B is built upon the Qwen2.5 architecture and includes specialized encoding for tabular data. It features a unique semantic encoder designed to interpret tabular data, capturing insights from rows, columns, and entire tables. Continual Pretraining (CPT) and Supervised Fine-Tuning (SFT) have been applied to equip the model for real-world BI applications and complex query processing. 

For now, the standalone decoder is open-sourced and fully functional without having to require assistance from the encoder. The encoder is currently under preparation, pending engineering considerations, primarily because we hope to provide a tighter integration with DeepSpeed and vLLM.


|              | Training Data                                    | Params | Context Length | Tokens                            | Tables        |
| ------------ | ------------------------------------------------ | ------ | -------------- | --------------------------------- | ------------- |
| TableGPT2-7B | Multimodal data sources and BI-specific examples | 7B     | 128K           | 86B tokens CPT, 2.36M SFT samples | 593.8K tables |

**Status**  

This model is static, trained on an offline dataset. Future versions may be released to enhance its performance on specialized tasks.

**QuickStart**

This code snippet demonstrates how to build a prompt with table information, and shows how to load the tokenizer, load the model, and generate content.

> Note that you need `transformers>=4.37.0` to use `TableGPT2`:
> ```sh
> pip install transformers>=4.37.0
> ```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Using pandas to read some structured data
import pandas as pd
from io import StringIO

# single table
EXAMPLE_CSV_CONTENT = """
"Loss","Date","Score","Opponent","Record","Attendance"
"Hampton (14–12)","September 25","8–7","Padres","67–84","31,193"
"Speier (5–3)","September 26","3–1","Padres","67–85","30,711"
"Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707"
"Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774"
"Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407"
"""

csv_file = StringIO(EXAMPLE_CSV_CONTENT)
df = pd.read_csv(csv_file)

model_name = "tablegpt/TableGPT2-7B"

model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question.

/*
"{var_name}.head(5).to_string(index=False)" as follows:
{df_info}
*/

Question: {user_question}
"""
question = "哪些比赛的战绩达到了40胜40负?"

prompt = example_prompt_template.format(
    var_name="df",
    df_info=df.head(5).to_string(index=False),
    user_question=question,
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

**Deployment**

For deployment, we recommend using vLLM.
* **Install vLLM**: You can install vLLM by running the following command.
  ```bash
  pip install "vllm>=0.4.3"
  ```
* **Model Deployment**: Use vLLM to deploy your model. For example, you can use the command to set up a server similar to openAI:
  ```bash
  python -m vllm.entrypoints.openai.api_server --served-model-name TableGPT2-7B --model path/to/weights
  ```
  Then you can access the Chat API by:

  ```bash
  curl http://localhost:8000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
      "model": "TableGPT2-7B",
      "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hey, who are you?"}
      ]
      }'
  
  ```
  For more details about how to use TableGPT2, please refer to [our repository on GitHub](https://github.com/tablegpt/tablegpt-agent)

**License**  

TableGPT2-7B is under apache-2.0 license.

<!-- The TableGPT2-7B license permits both research and commercial use, with further details available in the [GitHub repository](https://github.com/tablegpt/tablegpt-agent). -->

**Research Paper**  

TableGPT2-7B is introduced and validated in the paper "[TableGPT2: A Large Multimodal Model with Tabular Data Integration](https://arxiv.org/pdf/2411.02059)" available on arXiv.

**Where to send questions or comments about the model**  

Inquiries and feedback are welcome at [[email protected]](mailto:[email protected]).

## Training Data

**Overview**  

Training for TableGPT2-7B involved more than 593,800 curated tables, over 86 billion tokens for continual pretraining (CPT) and the construction of over 2.36 million high-quality query-table-output tuples for supervised fine-tuning. This extensive dataset aims to meet the rigorous demands of modern applications involving structured or tabular data.

**Data Freshness**  

The training data has a cutoff of October 2024.

## Evaluation Results

Evaluation has shown that TableGPT2-7B performs consistently well across benchmarks for tabular comprehension, code generation, and structured data reasoning, achieving a **35.20%** performance increase over comparable models on standard benchmarks and **49.32%** on BI-focused assessments. The RealTabBench benchmark further demonstrated the model’s robustness in handling unconventional tables and complex queries. Below, we present the results on public table-related benchmarks.

| **Benchmark**                 | **Metric** | GPT-4o | TableLLM (Qwen2) | TableLLM (CodeQwen) | TableLLM (LLaMA3) | TableLLM (LLaMA3.1) | TableLLM (DeepSeek) | TableLLM-13B | DeepSeek-lite | Yi-Coder | Qwen2.5-Coder | Qwen2.5-Instruct | **TableGPT2-7B** | **TableGPT2-72B** |
| ----------------------------- | ---------- | ------ | ---------------- | ------------------- | ----------------- | ------------------- | ------------------- | ------------ | ------------- | -------- | ------------- | ---------------- | -------------- | --------------- |
| **Table Understanding**       |            |        |                  |                     |                   |                     |                     |              |               |          |               |                  |                |                 |
| Col Type Annot.               | F1         | 31.75  | 10.10            | 5.71                | 1.47              | 1.59                | 6.04                | 12.70        | 20.58         | 5.38     | 32.59         | 22.19            | **85.88**      | 85.67           |
| Relation Extract.             | F1         | 52.95  | 1.60             | 3.79                | 2.39              | 2.00                | 3.34                | 18.16        | 8.67          | 2.25     | 31.00         | 15.92            | **83.35**      | 79.50           |
| Entity Linking                | Acc        | 90.80  | 47.10            | 39.70               | 0.20              | 0.60                | 15.50               | 66.25        | 70.15         | 41.75    | 71.70         | 82.25            | 92.00          | **93.30**       |
| Row Pop.                      | MAP        | 53.40  | 2.20             | 5.14                | 1.93              | 6.23                | 3.13                | 14.25        | 1.20          | 1.00     | 13.23         | 12.30            | **59.97**      | 55.83           |
| **Question Answering**        |            |        |                  |                     |                   |                     |                     |              |               |          |               |                  |                |                 |
| HiTab                         | Exec Acc   | 48.40  | 11.74            | 0.00                | 0.00              | 0.00                | 39.08               | 6.30         | 0.76          | 0.00     | 1.70          | 10.73            | 70.27          | **75.57**       |
| FetaQA                        | BLEU       | 21.70  | 12.24            | 8.69                | 2.42              | 3.10                | 7.94                | 10.83        | 15.08         | 11.17    | 13.00         | 16.91            | 28.97          | **32.25**       |
| HybridQA                      | Acc        | 58.60  | 27.12            | 20.14               | 27.35             | 27.61               | 19.53               | 51.88        | 42.58         | 29.83    | 51.10         | 51.13            | 53.17          | **56.41**       |
| WikiSQL                       | Acc        | 47.60  | 46.50            | 37.20               | 39.26             | 39.00               | 36.14               | 41.10        | 38.30         | 25.34    | 46.90         | 47.42            | 53.74          | **57.32**       |
| WikiTQ                        | Acc        | 68.40  | 64.16            | 36.05               | 34.95             | 38.84               | 36.05               | 66.30        | 47.65         | 43.37    | **74.50**     | 68.55            | 61.42          | 71.45           |
| **Fact Verification**         |            |        |                  |                     |                   |                     |                     |              |               |          |               |                  |                |                 |
| TabFact                       | Acc        | 74.40  | 72.00            | 53.20               | 40.06             | 27.13               | 60.76               | 68.95        | 62.27         | 79.6     | 77.26         | 84.60            | 77.80          | **85.43**       |
| FEVEROUS                      | Acc        | 71.60  | 20.10            | 46.90               | 51.50             | 42.30               | 18.39               | 21.45        | 7.80          | 38.10    | 60.70         | 63.30            | **78.05**      | 76.80           |
| **Table to Text**             |            |        |                  |                     |                   |                     |                     |              |               |          |               |                  |                |                 |
| ToTTo                         | BLEU       | 12.21  | 6.95             | 3.10                | 5.50              | 6.23                | 3.81                | 5.36         | 8.76          | 2.64     | 10.50         | 11.91            | 14.10          | **22.69**       |
| **Natural Language to SQL**   |            |        |                  |                     |                   |                     |                     |              |               |          |               |                  |                |                 |
| BIRD(dev)                     | Exec Acc   | -      | 9.13             | 7.37                | 1.83              | 2.48                | 0.39                | 0.72         | 25.10         | 24.19    | 27.18         | 18.97            | 31.42          | **38.40**       |
| BIRD(dev-knowledge)           | Exec Acc   | -      | 15.45            | 18.19               | 3.39              | 3.72                | 0.39                | 1.83         | 36.51         | 39.96    | 42.96         | 31.42            | 49.28          | **60.76**       |
| Spider(dev)                   | Exec Acc   | -      | 42.26            | 32.88               | 12.86             | 18.96               | 2.71                | 4.26         | 66.44         | 58.12    | 70.99         | 61.70            | 76.31          | **79.40**       |
| Spider(test)                  | Exec Acc   | -      | 40.29            | 34.93               | 12.02             | 16.35               | 7.33                | 2.93         | 66.65         | 56.87    | 69.73         | 60.18            | 74.38          | **78.48**       |
| **Holistic Table Evaluation** |            |        |                  |                     |                   |                     |                     |              |               |          |               |                  |                |                 |
| TableBench                    | DP         | -      | 26.62            | 26.44               | 26.71             | 26.73               | 26.15               | 3.88         | 29.60         | 21.94    | 28.67         | 25.18            | 32.03          | **38.90**       |
| TableBench                    | TCoT       | -      | 37.08            | 31.33               | 29.79             | 30.01               | 28.65               | 3.85         | 30.93         | 22.8     | 36.25         | 29.77            | 42.34          | **50.06**       |
| TableBench                    | SCoT       | -      | 14.11            | 17.78               | 9.60              | 12.38               | 22.39               | 2.88         | 22.61         | 8.43     | 25.95         | 24.35            | 25.01          | **30.47**       |
| TableBench                    | PoT@1      | -      | 21.05            | 26.39               | 31.96             | 25.80               | 28.39               | 2.94         | 10.90         | 11.36    | 16.15         | 22.58            | **33.52**      | 28.98           |

## Citation

If you find our work helpful, please cite us by

```bibtex
@misc{su2024tablegpt2largemultimodalmodel,
      title={TableGPT2: A Large Multimodal Model with Tabular Data Integration}, 
      author={Aofeng Su and Aowen Wang and Chao Ye and Chen Zhou and Ga Zhang and Guangcheng Zhu and Haobo Wang and Haokai Xu and Hao Chen and Haoze Li and Haoxuan Lan and Jiaming Tian and Jing Yuan and Junbo Zhao and Junlin Zhou and Kaizhe Shou and Liangyu Zha and Lin Long and Liyao Li and Pengzuo Wu and Qi Zhang and Qingyi Huang and Saisai Yang and Tao Zhang and Wentao Ye and Wufang Zhu and Xiaomeng Hu and Xijun Gu and Xinjie Sun and Xiang Li and Yuhang Yang and Zhiqing Xiao},
      year={2024},
      eprint={2411.02059},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2411.02059}, 
}
```