yuan-tian
/

chartgpt

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chartgpt / README.md

yuan-tian's picture

Update README.md

dece204 verified 4 months ago

|

history blame contribute delete

3.48 kB

	---
	license: apache-2.0
	datasets:
	- yuan-tian/chartgpt-dataset
	language:
	- en
	metrics:
	- rouge
	pipeline_tag: text2text-generation
	base_model:
	- google/flan-t5-xl
	new_version: yuan-tian/chartgpt-llama3
	---
	# Model Card for ChartGPT

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	This model is used to generate charts from natural language. For more information, please refer to the paper.

	* Model type: Language model
	* Language(s) (NLP): English
	* License: Apache 2.0
	* Finetuned from model: [FLAN-T5-XL](https://huggingface.co/google/flan-t5-xl)
	* Research paper: [ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language](https://ieeexplore.ieee.org/document/10443572)

	### Model Input Format

	<details>
	<summary> Click to expand </summary>

	Model input on the Step `x`. Specifically, `<...>` serves as a seperation token.

	```
	{table name}
	<head> {column names}
	<type> {column types}
	<data> {data row 1} <line> {data row 2} <line>
	<utterance> {NL utterance}
	<ans>
	<sep> {Step 1 prompt} {Answer 2}
	...
	<sep> {Step x-1 prompt} {Answer x-1}
	<sep> {Step x prompt}
	```

	And the model should output the answer corresponding to step `x`.

	The step 1-6 prompts are as follows:

	```
	Step 1. Select columns:
	Step 2. Add filter:
	Step 3. Add aggregations:
	Step 4. Select chart type:
	Step 5. Choose encoding:
	Step 6. Add sort:
	```
	</details>

	## How to Get Started with the Model

	### Running the Model on a GPU

	An example of a movie dataset with an utterance "What kinds of movies are the most popular?".
	The model should give the answers to step 1 (select columns).
	You can use the code below to test if you can run the model successfully.

	<details>
	<summary> Click to expand </summary>

	```python
	from transformers import (
	AutoTokenizer,
	AutoModelForSeq2SeqLM,
	)
	tokenizer = AutoTokenizer.from_pretrained("yuan-tian/chartgpt")
	model = AutoModelForSeq2SeqLM.from_pretrained("yuan-tian/chartgpt", device_map="auto")
	input_text = "movies <head> Title,Worldwide_Gross,Production_Budget,Release_Year,Content_Rating,Running_Time,Major_Genre,Creative_Type,Rotten_Tomatoes_Rating,IMDB_Rating <type> nominal,quantitative,quantitative,temporal,nominal,quantitative,nominal,nominal,quantitative,quantitative <data> From Dusk Till Dawn,25728961,20000000,1996,R,107,Horror,Fantasy,63,7.1 <line> Broken Arrow,148345997,65000000,1996,R,108,Action,Contemporary Fiction,55,5.8 <line> <utterance> What kinds of movies are the most popular? <ans> <sep> Step 1. Select the columns:"
	inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda")
	outputs = model.generate(**inputs)
	print(tokenizer.decode(outputs[0], skip_special_tokens = True))
	```

	</details>

	## Training Details

	### Training Data

	This model is Fine-tuned from [FLAN-T5-XL](https://huggingface.co/google/flan-t5-xl) on the [chartgpt-dataset](https://huggingface.co/datasets/yuan-tian/chartgpt-dataset).

	### Training Procedure

	Plan to update the preprocessing and training procedure in the future.

	## Citation


	BibTeX:

	```
	@article{tian2024chartgpt,
	title={ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language},
	author={Tian, Yuan and Cui, Weiwei and Deng, Dazhen and Yi, Xinjing and Yang, Yurun and Zhang, Haidong and Wu, Yingcai},
	journal={IEEE Transactions on Visualization and Computer Graphics},
	year={2024},
	pages={1-15},
	doi={10.1109/TVCG.2024.3368621}
	}
	```