Spaces:

etweedy
/

roberta-squad-v2

Runtime error

App Files Files Community

etweedy commited on Jul 7, 2023

Commit

c09673f

1 Parent(s): e77a114

Upload 9 files

Browse files

Files changed (6) hide show

app.py +61 -25
examples.csv +3 -3
lib/.DS_Store +0 -0
lib/.ipynb_checkpoints/utils-checkpoint.py +10 -0
lib/__pycache__/utils.cpython-310.pyc +0 -0
lib/utils.py +10 -0

app.py CHANGED Viewed

@@ -24,7 +24,7 @@ else:
 # - add examples
 # What else??
 if 'response' not in st.session_state:
     st.session_state['response'] = ''
 if 'context' not in st.session_state:
@@ -52,37 +52,73 @@ def clear_boxes():
 with st.spinner('Loading the model...'):
     model, tokenizer = get_model()
-ex_q, ex_c = get_examples()
-for i in range(len(ex_q)):
-    st.sidebar.button(
-        label = f'Try example {i+1}',
-        key = f'ex_button_{i+1}',
-        on_click = fill_in_example,
-        args=(i,),
-    )
-st.sidebar.button(
-    label = 'Clear boxes',
-    key = 'clear_button',
-    on_click = clear_boxes,
-)
 st.header('RoBERTa Q&A model')
 st.markdown('''
-This app demonstrates the answer-retrieval capabilities of a finetuned RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers) model.  The [RoBERTa base model](https://huggingface.co/roberta-base) was fine-tuned on version 2 of the [SQuAD (Stanford Question Answering Dataset) dataset](https://huggingface.co/datasets/squad_v2), a dataset of context-question-answer triples.  The objective of the model is to retrieve the answer to the question from the context paragraph.
-Version 2 incorporates the 100,000 samples from Version 1.1, along with 50,000 'unanswerable' questions, i.e. samples in the question cannot be answered using the context given.
-Please type or paste a context paragraph and question you'd like to ask about it.  The model will attempt to answer the question, or otherwise will report that it cannot.
-Alternatively, you can try some of the examples provided on the sidebar to the left.
 ''')
 input_container = st.container()
-st.divider()
 response_container = st.container()
 # Form for user inputs
 with input_container:
     with st.form(key='input_form',clear_on_submit=False):
@@ -94,7 +130,6 @@ with input_container:
             placeholder='Enter your context paragraph here.',
             height=300,
         )
-        st.session_state['context'] = context
         question = st.text_input(
             label='Question',
             value=st.session_state['question'],
@@ -102,9 +137,10 @@ with input_container:
             label_visibility='hidden',
             placeholder='Enter your question here.',
         )
-        st.session_state['question'] = question
         query_submitted = st.form_submit_button("Submit")
         if query_submitted:
             with st.spinner('Generating response...'):
                 data_raw = Dataset.from_dict(
                     {

 # - add examples
 # What else??
+# Initialize session state variables
 if 'response' not in st.session_state:
     st.session_state['response'] = ''
 if 'context' not in st.session_state:
 with st.spinner('Loading the model...'):
     model, tokenizer = get_model()
 st.header('RoBERTa Q&A model')
 st.markdown('''
+This app demonstrates the answer-retrieval capabilities of a fine-tuned RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers) model.
+''')
+with st.expander('Click to read more about the model...'):
+    st.markdown('''
+* [Click here](https://huggingface.co/etweedy/roberta-base-squad-v2) to visit the Hugging Face model card for this fine-tuned model.
+* To create this model, the [RoBERTa base model](https://huggingface.co/roberta-base) was fine-tuned on Version 2 of [SQuAD (Stanford Question Answering Dataset)](https://huggingface.co/datasets/squad_v2), a dataset of context-question-answer triples.
+* The objective of the model is "extractive question answering", the task of retrieving the answer to the question from a given context text corpus.
+* SQuAD Version 2 incorporates the 100,000 samples from Version 1.1, along with 50,000 'unanswerable' questions, i.e. samples in the question cannot be answered using the context given.
+* The original base RoBERTa model was introduced in [this paper](https://arxiv.org/abs/1907.11692) and [this repository](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta).  Here's a citation for that base model:
+```bibtex
+@article{DBLP:journals/corr/abs-1907-11692,
+  author    = {Yinhan Liu and
+               Myle Ott and
+               Naman Goyal and
+               Jingfei Du and
+               Mandar Joshi and
+               Danqi Chen and
+               Omer Levy and
+               Mike Lewis and
+               Luke Zettlemoyer and
+               Veselin Stoyanov},
+  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
+  journal   = {CoRR},
+  volume    = {abs/1907.11692},
+  year      = {2019},
+  url       = {http://arxiv.org/abs/1907.11692},
+  archivePrefix = {arXiv},
+  eprint    = {1907.11692},
+  timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
+''')
+st.markdown('''
+Please type or paste a context paragraph and question you'd like to ask about it.  The model will attempt to answer the question, or otherwise will report that it cannot.  Your results will appear below the question field when the model is finished running.
+Alternatively, you can try an example by clicking one of the buttons below:
 ''')
+ex_q, ex_c = get_examples()
+example_container = st.container()
 input_container = st.container()
 response_container = st.container()
+with example_container:
+    ex_cols = st.columns(len(ex_q)+1)
+    for i in range(len(ex_q)):
+        with ex_cols[i]:
+            st.button(
+                label = f'Try example {i+1}',
+                key = f'ex_button_{i+1}',
+                on_click = fill_in_example,
+                args=(i,),
+            )
+    with ex_cols[-1]:
+        st.button(
+            label = "Clear all fields",
+            key = "clear_button",
+            on_click = clear_boxes,
+        )
 # Form for user inputs
 with input_container:
     with st.form(key='input_form',clear_on_submit=False):
             placeholder='Enter your context paragraph here.',
             height=300,
         )
         question = st.text_input(
             label='Question',
             value=st.session_state['question'],
             label_visibility='hidden',
             placeholder='Enter your question here.',
         )
         query_submitted = st.form_submit_button("Submit")
         if query_submitted:
+            st.session_state['question'] = question
+            st.session_state['context'] = context
             with st.spinner('Generating response...'):
                 data_raw = Dataset.from_dict(
                     {

examples.csv CHANGED Viewed

@@ -1,4 +1,4 @@
 question,context
-What did Oppenheimer remark abotut the explosion?,"Oppenheimer attended Harvard University, where he earned a bachelor's degree in chemistry in 1925. He studied physics at the University of Cambridge and University of Göttingen, where he received his PhD in 1927. He held academic positions at the University of California, Berkeley, and the California Institute of Technology, and made significant contributions to theoretical physics, including in quantum mechanics and nuclear physics. During World War II, he was recruited to work on the Manhattan Project, and in 1943 was appointed as director of the Los Alamos Laboratory in New Mexico, tasked with developing the weapons. Oppenheimer's leadership and scientific expertise were instrumental in the success of the project. He was among those who observed the Trinity test on July 16, 1945, in which the first atomic bomb was successfully detonated. He later remarked that the explosion brought to his mind words from the Hindu scripture Bhagavad Gita: ""Now I am become Death, the destroyer of worlds."" In August 1945, the atomic bombs were used on the Japanese cities of Hiroshima and Nagasaki, the only use of nuclear weapons in war."
-What was the phrase on the billboard which inspired the Twinkies name?,"Twinkies were invented on April 6, 1930, by Canadian-born baker James Alexander Dewar for the Continental Baking Company in Schiller Park, Illinois. Realizing that several machines used for making cream-filled strawberry shortcake sat idle when strawberries were out of season, Dewar conceived a snack cake filled with banana cream, which he dubbed the Twinkie. Ritchy Koph said he came up with the name when he saw a billboard in St. Louis for ""Twinkle Toe Shoes"".  During World War II, bananas were rationed, and the company was forced to switch to vanilla cream. This change proved popular, and banana-cream Twinkies were not widely re-introduced. The original flavor was occasionally found in limited time only promotions, but the company used vanilla cream for most Twinkies. In 1988, Fruit and Cream Twinkies were introduced with a strawberry filling swirled into the cream. The product was soon dropped. Vanilla's dominance over banana flavoring was challenged in 2005, following a month-long promotion of the movie King Kong. Hostess saw its Twinkie sales rise 20 percent during the promotion, and in 2007 restored the banana-cream Twinkie to its snack lineup although they are now made with 2% banana purée."
-What happened in November 2020?,"""Baby Shark"" is a children's song associated with a dance involving hand movements that originated as a campfire song dating back to at least the 20th century. In 2016, ""Baby Shark"" became very popular when Pinkfong, a South Korean entertainment company, released a version of the song with a YouTube music video that went viral across social media, online video, and radio. In January 2022, it became the first YouTube video to reach 10 billion views. In November 2020, Pinkfong's version became the most-viewed YouTube video of all time, with over 12 billion views as of April 2023. ""Baby Shark"" originated as a campfire song or chant. The original song dates back to at least the 20th century, potentially created by camp counselors inspired by the movie Jaws. In the chant, each member of a family of sharks is introduced, with campers using their hands to imitate the sharks' jaws. Different versions of the song have the sharks hunting fish, eating a sailor, or killing people who then go to heaven. Various entities have copyrighted original videos and sound recordings of the song, and some have trademarked merchandise based on their versions. However, according to The New York Times, the underlying song and characters are believed to be in the public domain."

 question,context
+What did Oppenheimer remark about the explosion?,"Oppenheimer attended Harvard University, where he earned a bachelor's degree in chemistry in 1925. He studied physics at the University of Cambridge and University of Göttingen, where he received his PhD in 1927. He held academic positions at the University of California, Berkeley, and the California Institute of Technology, and made significant contributions to theoretical physics, including in quantum mechanics and nuclear physics. During World War II, he was recruited to work on the Manhattan Project, and in 1943 was appointed as director of the Los Alamos Laboratory in New Mexico, tasked with developing the weapons. Oppenheimer's leadership and scientific expertise were instrumental in the success of the project. He was among those who observed the Trinity test on July 16, 1945, in which the first atomic bomb was successfully detonated. He later remarked that the explosion brought to his mind words from the Hindu scripture Bhagavad Gita: ""Now I am become Death, the destroyer of worlds."" In August 1945, the atomic bombs were used on the Japanese cities of Hiroshima and Nagasaki, the only use of nuclear weapons in war."
+Why did Twinkies change to vanilla cream?,"Twinkies were invented on April 6, 1930, by Canadian-born baker James Alexander Dewar for the Continental Baking Company in Schiller Park, Illinois. Realizing that several machines used for making cream-filled strawberry shortcake sat idle when strawberries were out of season, Dewar conceived a snack cake filled with banana cream, which he dubbed the Twinkie. Ritchy Koph said he came up with the name when he saw a billboard in St. Louis for ""Twinkle Toe Shoes"".  During World War II, bananas were rationed, and the company was forced to switch to vanilla cream. This change proved popular, and banana-cream Twinkies were not widely re-introduced. The original flavor was occasionally found in limited time only promotions, but the company used vanilla cream for most Twinkies. In 1988, Fruit and Cream Twinkies were introduced with a strawberry filling swirled into the cream. The product was soon dropped. Vanilla's dominance over banana flavoring was challenged in 2005, following a month-long promotion of the movie King Kong. Hostess saw its Twinkie sales rise 20 percent during the promotion, and in 2007 restored the banana-cream Twinkie to its snack lineup although they are now made with 2% banana purée."
+When was Pinkfong founded?,"""Baby Shark"" is a children's song associated with a dance involving hand movements that originated as a campfire song dating back to at least the 20th century. In 2016, ""Baby Shark"" became very popular when Pinkfong, a South Korean entertainment company, released a version of the song with a YouTube music video that went viral across social media, online video, and radio. In January 2022, it became the first YouTube video to reach 10 billion views. In November 2020, Pinkfong's version became the most-viewed YouTube video of all time, with over 12 billion views as of April 2023. ""Baby Shark"" originated as a campfire song or chant. The original song dates back to at least the 20th century, potentially created by camp counselors inspired by the movie Jaws. In the chant, each member of a family of sharks is introduced, with campers using their hands to imitate the sharks' jaws. Different versions of the song have the sharks hunting fish, eating a sailor, or killing people who then go to heaven. Various entities have copyrighted original videos and sound recordings of the song, and some have trademarked merchandise based on their versions. However, according to The New York Times, the underlying song and characters are believed to be in the public domain."

lib/.DS_Store CHANGED Viewed

Binary files a/lib/.DS_Store and b/lib/.DS_Store differ

lib/.ipynb_checkpoints/utils-checkpoint.py CHANGED Viewed

@@ -189,6 +189,16 @@ def make_predictions(model,tokenizer,inputs,examples,
     return predicted_answers
 def get_examples():
     examples = pd.read_csv('examples.csv')
     questions = list(examples['question'])
     contexts = list(examples['context'])

     return predicted_answers
 def get_examples():
+    """
+    Retrieve pre-made examples from a .csv file
+    Parameters: None
+    -----------
+    Returns:
+    --------
+    questions, contexts : list, list
+        Lists of examples of corresponding question-context pairs
+    """
     examples = pd.read_csv('examples.csv')
     questions = list(examples['question'])
     contexts = list(examples['context'])

lib/__pycache__/utils.cpython-310.pyc CHANGED Viewed

Binary files a/lib/__pycache__/utils.cpython-310.pyc and b/lib/__pycache__/utils.cpython-310.pyc differ

lib/utils.py CHANGED Viewed

@@ -189,6 +189,16 @@ def make_predictions(model,tokenizer,inputs,examples,
     return predicted_answers
 def get_examples():
     examples = pd.read_csv('examples.csv')
     questions = list(examples['question'])
     contexts = list(examples['context'])

     return predicted_answers
 def get_examples():
+    """
+    Retrieve pre-made examples from a .csv file
+    Parameters: None
+    -----------
+    Returns:
+    --------
+    questions, contexts : list, list
+        Lists of examples of corresponding question-context pairs
+    """
     examples = pd.read_csv('examples.csv')
     questions = list(examples['question'])
     contexts = list(examples['context'])