TableGPT2-7B / README.md

[Doc] Add Quick Start and Deployment (#1)

2431b1f verified 14 days ago

13.8 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	base_model:
	- Qwen/Qwen2.5-7B
	---

	# TableGPT2-7B

	## Model details

	We developed and released TableGPT2-7B, a large-scale decoder specifically tailored for data-intensive tasks, with a focus on interpreting and analyzing tabular data. TableGPT2-7B is designed to bridge the gap between conventional LLM capabilities and the real-world demands of tabular/structured data tasks, such as those in business intelligence (BI), automated data-driven analysis, and application tasks tightly involving databases or data warehouses.

	Model Developers

	Zhejiang University

	Variations

	TableGPT2 is available in two configurations—7B and 72B parameters—both derived from the Qwen2.5 model family and optimized for handling structured data in tabular formats. Currently, we have released the 7B version to the public.

	Input

	TableGPT2-7B accepts both text and tabular data as input.

	Output

	TableGPT2-7B produces text-based outputs, specifically optimized for coding tasks, data interpretation, and BI-focused question answering.

	Language

	Our model places a strong emphasis on Chinese corpora, and currently, queries in other languages may have limited support.

	Other Requirements

	We highly recommend exploring [our repository on GitHub](https://github.com/tablegpt/tablegpt-agent), where users can integrate this model into our agent workflow for enhanced performance.

	Model Architecture

	TableGPT2-7B is built upon the Qwen2.5 architecture and includes specialized encoding for tabular data. It features a unique semantic encoder designed to interpret tabular data, capturing insights from rows, columns, and entire tables. Continual Pretraining (CPT) and Supervised Fine-Tuning (SFT) have been applied to equip the model for real-world BI applications and complex query processing.

	For now, the standalone decoder is open-sourced and fully functional without having to require assistance from the encoder. The encoder is currently under preparation, pending engineering considerations, primarily because we hope to provide a tighter integration with DeepSpeed and vLLM.


	\| \| Training Data \| Params \| Context Length \| Tokens \| Tables \|
	\| ------------ \| ------------------------------------------------ \| ------ \| -------------- \| --------------------------------- \| ------------- \|
	\| TableGPT2-7B \| Multimodal data sources and BI-specific examples \| 7B \| 128K \| 86B tokens CPT, 2.36M SFT samples \| 593.8K tables \|

	Status

	This model is static, trained on an offline dataset. Future versions may be released to enhance its performance on specialized tasks.

	Quickstart

	Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "tablegpt/TableGPT2-7B"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Hey, who are you?"
	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	Deployment

	For deployment, we recommend using vLLM.
	* Install vLLM: You can install vLLM by running the following command.
	```bash
	pip install "vllm>=0.4.3"
	```
	* Model Deployment: Use vLLM to deploy your model. For example, you can use the command to set up a server similar to openAI:
	```bash
	python -m vllm.entrypoints.openai.api_server --served-model-name TableGPT2-7B --model path/to/weights
	```
	Then you can access the Chat API by:

	```bash
	curl http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "TableGPT2-7B",
	"messages": [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "Hey, who are you?"}
	]
	}'

	```

	License

	The TableGPT2-7B license permits both research and commercial use, with further details available in the [GitHub repository](https://github.com/tablegpt/tablegpt-agent).

	Research Paper

	TableGPT2-7B is introduced and validated in the paper "[TableGPT2: A Large Multimodal Model with Tabular Data Integration](URL_TODO)" available on arXiv.

	Where to send questions or comments about the model

	Inquiries and feedback are welcome at [[email protected]](mailto:[email protected]).

	## Training Data

	Overview

	Training for TableGPT2-7B involved more than 593,800 curated tables, over 86 billion tokens for continual pretraining (CPT) and the construction of over 2.36 million high-quality query-table-output tuples for supervised fine-tuning. This extensive dataset aims to meet the rigorous demands of modern applications involving structured or tabular data.

	Data Freshness

	The training data has a cutoff of October 2024.

	## Evaluation Results

	Evaluation has shown that TableGPT2-7B performs consistently well across benchmarks for tabular comprehension, code generation, and structured data reasoning, achieving a 35.20% performance increase over comparable models on standard benchmarks and 49.32% on BI-focused assessments. The RealTabBench benchmark further demonstrated the model’s robustness in handling unconventional tables and complex queries. Below, we present the results on public table-related benchmarks.

	\| Benchmark \| Metric \| GPT-4o \| TableLLM (Qwen2) \| TableLLM (CodeQwen) \| TableLLM (LLaMA3) \| TableLLM (LLaMA3.1) \| TableLLM (DeepSeek) \| TableLLM-13B \| DeepSeek-lite \| Yi-Coder \| Qwen2.5-Coder \| Qwen2.5-Instruct \| TableGPT2-7B \| TableGPT2-72B \|
	\| ----------------------------- \| ---------- \| ------ \| ---------------- \| ------------------- \| ----------------- \| ------------------- \| ------------------- \| ------------ \| ------------- \| -------- \| ------------- \| ---------------- \| -------------- \| --------------- \|
	\| Table Understanding \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|
	\| Col Type Annot. \| F1 \| 31.75 \| 10.10 \| 5.71 \| 1.47 \| 1.59 \| 6.04 \| 12.70 \| 20.58 \| 5.38 \| 32.59 \| 22.19 \| 85.88 \| 85.67 \|
	\| Relation Extract. \| F1 \| 52.95 \| 1.60 \| 3.79 \| 2.39 \| 2.00 \| 3.34 \| 18.16 \| 8.67 \| 2.25 \| 31.00 \| 15.92 \| 83.35 \| 79.50 \|
	\| Entity Linking \| Acc \| 90.80 \| 47.10 \| 39.70 \| 0.20 \| 0.60 \| 15.50 \| 66.25 \| 70.15 \| 41.75 \| 71.70 \| 82.25 \| 92.00 \| 93.30 \|
	\| Row Pop. \| MAP \| 53.40 \| 2.20 \| 5.14 \| 1.93 \| 6.23 \| 3.13 \| 14.25 \| 1.20 \| 1.00 \| 13.23 \| 12.30 \| 59.97 \| 55.83 \|
	\| Question Answering \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|
	\| HiTab \| Exec Acc \| 48.40 \| 11.74 \| 0.00 \| 0.00 \| 0.00 \| 39.08 \| 6.30 \| 0.76 \| 0.00 \| 1.70 \| 10.73 \| 70.27 \| 75.57 \|
	\| FetaQA \| BLEU \| 21.70 \| 12.24 \| 8.69 \| 2.42 \| 3.10 \| 7.94 \| 10.83 \| 15.08 \| 11.17 \| 13.00 \| 16.91 \| 28.97 \| 32.25 \|
	\| HybridQA \| Acc \| 58.60 \| 27.12 \| 20.14 \| 27.35 \| 27.61 \| 19.53 \| 51.88 \| 42.58 \| 29.83 \| 51.10 \| 51.13 \| 53.17 \| 56.41 \|
	\| WikiSQL \| Acc \| 47.60 \| 46.50 \| 37.20 \| 39.26 \| 39.00 \| 36.14 \| 41.10 \| 38.30 \| 25.34 \| 46.90 \| 47.42 \| 53.74 \| 57.32 \|
	\| WikiTQ \| Acc \| 68.40 \| 64.16 \| 36.05 \| 34.95 \| 38.84 \| 36.05 \| 66.30 \| 47.65 \| 43.37 \| 74.50 \| 68.55 \| 61.42 \| 71.45 \|
	\| Fact Verification \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|
	\| TabFact \| Acc \| 74.40 \| 72.00 \| 53.20 \| 40.06 \| 27.13 \| 60.76 \| 68.95 \| 62.27 \| 79.6 \| 77.26 \| 84.60 \| 77.80 \| 85.43 \|
	\| FEVEROUS \| Acc \| 71.60 \| 20.10 \| 46.90 \| 51.50 \| 42.30 \| 18.39 \| 21.45 \| 7.80 \| 38.10 \| 60.70 \| 63.30 \| 78.05 \| 76.80 \|
	\| Table to Text \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|
	\| ToTTo \| BLEU \| 12.21 \| 6.95 \| 3.10 \| 5.50 \| 6.23 \| 3.81 \| 5.36 \| 8.76 \| 2.64 \| 10.50 \| 11.91 \| 14.10 \| 22.69 \|
	\| Natural Language to SQL \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|
	\| BIRD(dev) \| Exec Acc \| - \| 9.13 \| 7.37 \| 1.83 \| 2.48 \| 0.39 \| 0.72 \| 25.10 \| 24.19 \| 27.18 \| 18.97 \| 31.42 \| 38.40 \|
	\| BIRD(dev-knowledge) \| Exec Acc \| - \| 15.45 \| 18.19 \| 3.39 \| 3.72 \| 0.39 \| 1.83 \| 36.51 \| 39.96 \| 42.96 \| 31.42 \| 49.28 \| 60.76 \|
	\| Spider(dev) \| Exec Acc \| - \| 42.26 \| 32.88 \| 12.86 \| 18.96 \| 2.71 \| 4.26 \| 66.44 \| 58.12 \| 70.99 \| 61.70 \| 76.31 \| 79.40 \|
	\| Spider(test) \| Exec Acc \| - \| 40.29 \| 34.93 \| 12.02 \| 16.35 \| 7.33 \| 2.93 \| 66.65 \| 56.87 \| 69.73 \| 60.18 \| 74.38 \| 78.48 \|
	\| Holistic Table Evaluation \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|
	\| TableBench \| DP \| - \| 26.62 \| 26.44 \| 26.71 \| 26.73 \| 26.15 \| 3.88 \| 29.60 \| 21.94 \| 28.67 \| 25.18 \| 32.03 \| 38.90 \|
	\| TableBench \| TCoT \| - \| 37.08 \| 31.33 \| 29.79 \| 30.01 \| 28.65 \| 3.85 \| 30.93 \| 22.8 \| 36.25 \| 29.77 \| 42.34 \| 50.06 \|
	\| TableBench \| SCoT \| - \| 14.11 \| 17.78 \| 9.60 \| 12.38 \| 22.39 \| 2.88 \| 22.61 \| 8.43 \| 25.95 \| 24.35 \| 25.01 \| 30.47 \|
	\| TableBench \| PoT@1 \| - \| 21.05 \| 26.39 \| \| \| \| \| \| \| \| \| \| \|

	## Citation

	If you find our work helpful, please cite us by

	```
	XXX
	```