saadob12
/

t5_autochart_2

Text2Text Generation

Transformers

PyTorch

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

saadob12 commited on Oct 26

Commit

4c9b597

•

1 Parent(s): b9f5b00

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -34

README.md CHANGED Viewed

@@ -38,27 +38,27 @@ The input prompt template consists of the `title, x-y labels, and x-y values`.
 Before every input to the model, append `C2T:`.
 ```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-tokenizer = AutoTokenizer.from_pretrained("saadob12/t5_C2T_big")
-model = AutoModelForSeq2SeqLM.from_pretrained("saadob12/t5_C2T_big")
-data = 'Breakdown of coronavirus (COVID-19) deaths in South Korea as of March 16, 2020\n' \
-    'by chronic disease x-y labels\n' \
-    'Response - Share of cases, x-y values Circulatory system disease* 62.7%, ' \
-    'Endocrine and metabolic diseases** 46.7%, Mental illness*** 25.3%, ' \
-    'Respiratory diseases*** 24%, Urinary and genital diseases 14.7%, Cancer 13.3%, ' \
-    'Nervous system diseases 4%, Digestive system diseases 2.7%, Blood and hematopoietic diseases 1.3%' \
-prefix = 'C2T: '
-tokens = tokenizer.encode(prefix + data, truncation=True, padding='max_length', return_tensors='pt')
-generated = model.generate(tokens, num_beams=4, max_length=256)
-tgt_text = tokenizer.decode(generated[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
-summary = tgt_text.strip('[]""')
-# Summary:
-#As of March 16, 2020, around 62.7% of COVID-19 deaths in South Korea were related to circulatory system diseases.
-#Other chronic diseases include endocrine and metabolic diseases, mental illness, and cancer.
-#South Korea confirmed 30,017 infection cases, including 501 deaths.
 ```
 # Intended Use and Limitations
 You can use the model to generate summaries of data files.
@@ -85,19 +85,19 @@ May or may not generate an **okay** summary at best for the following kind of da
 # Citation
 Kindly cite the work. Thank you.
 ```
-@inproceedings{obaid-ul-islam-etal-2023-tackling,
-    title = {Tackling Hallucinations in Neural Chart Summarization},
-    author = {Obaid ul Islam, Saad and Škrjanec, Iza and Dusek, Ondrej and Demberg, Vera},
-    booktitle = {Proceedings of the 16th International Natural Language Generation Conference},
-    month = sep,
-    year = {2023},
-    address = {Prague, Czechia},
-    publisher = {Association for Computational Linguistics},
-    url = {https://aclanthology.org/2023.inlg-main.30},
-    doi = {10.18653/v1/2023.inlg-main.30},
-    pages = {414--423},
-    abstract = {Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to preprocess the training data and show through human evaluation that our method significantly reduces hallucinations. We also found that shortening long-distance dependencies in the input sequence and adding chart-related information like title and legends improves the overall performance.}
-}
 ```

 Before every input to the model, append `C2T:`.
 ```python
+    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+    tokenizer = AutoTokenizer.from_pretrained("saadob12/t5_C2T_big")
+    model = AutoModelForSeq2SeqLM.from_pretrained("saadob12/t5_C2T_big")
+    data = 'Breakdown of coronavirus (COVID-19) deaths in South Korea as of March 16, 2020\n' \
+        'by chronic disease x-y labels\n' \
+        'Response - Share of cases, x-y values Circulatory system disease* 62.7%, ' \
+        'Endocrine and metabolic diseases** 46.7%, Mental illness*** 25.3%, ' \
+        'Respiratory diseases*** 24%, Urinary and genital diseases 14.7%, Cancer 13.3%, ' \
+        'Nervous system diseases 4%, Digestive system diseases 2.7%, Blood and hematopoietic diseases 1.3%' \
+    prefix = 'C2T: '
+    tokens = tokenizer.encode(prefix + data, truncation=True, padding='max_length', return_tensors='pt')
+    generated = model.generate(tokens, num_beams=4, max_length=256)
+    tgt_text = tokenizer.decode(generated[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
+    summary = tgt_text.strip('[]""')
+    # Summary:
+    #As of March 16, 2020, around 62.7% of COVID-19 deaths in South Korea were related to circulatory system diseases.
+    #Other chronic diseases include endocrine and metabolic diseases, mental illness, and cancer.
+    #South Korea confirmed 30,017 infection cases, including 501 deaths.
 ```
 # Intended Use and Limitations
 You can use the model to generate summaries of data files.
 # Citation
 Kindly cite the work. Thank you.
 ```
+    @inproceedings{obaid-ul-islam-etal-2023-tackling,
+        title = {Tackling Hallucinations in Neural Chart Summarization},
+        author = {Obaid ul Islam, Saad and Škrjanec, Iza and Dusek, Ondrej and Demberg, Vera},
+        booktitle = {Proceedings of the 16th International Natural Language Generation Conference},
+        month = sep,
+        year = {2023},
+        address = {Prague, Czechia},
+        publisher = {Association for Computational Linguistics},
+        url = {https://aclanthology.org/2023.inlg-main.30},
+        doi = {10.18653/v1/2023.inlg-main.30},
+        pages = {414--423},
+        abstract = {Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to preprocess the training data and show through human evaluation that our method significantly reduces hallucinations. We also found that shortening long-distance dependencies in the input sequence and adding chart-related information like title and legends improves the overall performance.}
+    }
 ```