from transformers import pipeline import streamlit as st import pandas as pd from PIL import Image import os # read in example df = pd.read_csv("images/example_claim_evidence_pairs.csv") # title st.title('Combatting Climate Change Misinformation with Transformers') st.markdown("## The Gist") st.markdown("**Problem**πŸ€”: Climate change misinformation spreads quickly and is difficult to combat. However, its important to do so, because climate change misinformation has direct impacts on public opinion and public policy surrounding climate change.") st.markdown("**Solution**πŸ’‘: Develop a pipeline in which users can input climate change claims... and the pipeline returns whether the claim is refuted or supported by current climate science, along with the corresponding evidence.") st.markdown("**Approach**πŸ”‘:") st.markdown("* There are many steps to this pipeline. Here, I focus on fine-tuning a transformer model, ClimateBERT, using the textual entailment task.") st.markdown("* The dataset used is Climate FEVER, a natural language inference dataset with 1,535 {claim, [evidence], [label]} tuples") st.markdown("* Given a {claim, evidence} pair, determine whether the climate claim is supported or refuted (or neither) by the evidence") st.markdown("---") st.markdown("## The Details") # section 1: the context, problem; how to address st.markdown("### Problem πŸ€”") st.markdown("Misinformation about climate change spreads quickly and has direct impacts on public opinion and public policy surrounding the climate. Further, misinformation is difficult to combat, and people are able to \"verify\" false climate claims on biased sites. Ideally, people would be able to easily verify climate claims. This is where transformers come in.") # section 2: what is misinformation? how is it combatted now? how successful is this? st.markdown("### More about Misinformation") st.markdown("What is misinformation? How does it spread?") st.markdown("* **Misinformation** can be defined as β€œfalse or inaccurate information, especially that which is deliberately intended to deceive.”") st.markdown("* It can exist in different domains, and each domain has different creators and distributors of misinformation.") st.markdown("* Misinformation regarding climate change is often funded by conservative foundations or large energy industries such as gas, coal, and oil. (1)") misinfo_flowchart = Image.open('images/misinfo_chart.jpeg') st.image(misinfo_flowchart, caption='The misinformation flowchart. (1)') st.markdown("**Why does this matter?** Through echo chambers, polarization, and feedback loops, misinformation can spread from these large organizes to the public, thus arming the public with pursausive information designed to create scepticism around and/or denial of climate change, its urgency, and climate change scientists. This is especially problematic in democratic societies, where the public, to some extent, influences governmental policy decisions (brookings). Existing research suggests that misinformation directly contributes to public support of political inaction and active stalling or rejection of pro- climate change policies (1).") st.markdown("How is climate change misinformation combatted now? Below are a few of the ways according to the Brookings Institute:") st.markdown("1. Asking news sources to call out misinformation") st.markdown("2. Teaching and encouraging media literacy among the public (how to detect fake news, critical evaluation of information provided, etc.") st.markdown("3. Governments should encourage independent journalism but avoid censoring news") st.markdown("4. Social media platform investment in algorithmic detection of fake news") st.markdown("However, many of the proposed solutions above require adoption of behaviors. This is difficult to acheive, particularly among news organizations and social media platforms which receive monetary benefits from misinformation in the form of ad revenue from cite usage and viewership.") # section 3: how can transformers help? st.markdown("### How can Transformers Help?πŸ’‘") st.markdown("**FEVER**") st.markdown("* FEVER, or Fact Extraction and VERification, was introduced in 2018 as the first dataset containing {fact, evdience, entailment_label} information. They extracted altering sentences from Wikipedia and had annotators report the relationship between the setences: entailment, contradition, not enough information.") st.markdown("* Since then, other researchers have expanded on this area in different domains") st.markdown("* Here, we use Climate FEVER (3), a similar dataset developed and annotated by ") st.markdown("**Fact Verification / Fact-Checking**") st.markdown("* This is simply an extenstion of the textual entailment task") st.markdown("* Given two sentences, sent1 and sent2, determine the relationship: entail, contradict, neutral") st.markdown("* With fact verification, we can think of the sentences as claim and evidence and labels as support, refute, or not enough information to refute or support.") # section 4: The process # this is the pipeline in my notes (u are here highlight) st.markdown("### The Process πŸ”‘") st.markdown("Imagine: A person is curious about whether a claim they heard about climate change is true. How can transformers help validate or refute the claim?") st.markdown("1. User inputs a climate claim") st.markdown("2. Retrieve evidence related to input claim \ - For each claim, collect N related documents. These documents are selected by finding the N documents with the highest similarity scores to the claim. A current area of research: How do we keep the set of curated documents up-to-date? Validate their contents?") st.markdown("3. Send (claim, evidence) pairs to a transformer model. Have the model predict whether each evidence supports, refutes, or is not relevant to the claim. (πŸ“ YOU ARE HERE!)") st.markdown("4. Report back to the user: The supporting evidence for the claim (if any), the refuting evidence for the claim (if any). If no relevant evidence is found, report that the claim cannot be supported or refuted by current evidence.") # section 5: my work st.markdown("### Climate Claim Fact-Checking with Transformers") st.markdown("My work focuses on step 3 of the process: Training a transformer model to accurately categorize (claim, evidence) as:") st.markdown("* evidence *supports* (entails) claim") st.markdown("* evidence *refutes* (contradicts) claim") st.markdown("* evidence *does not provide enough info to support or refute* (neutral) claim") st.markdown("For this project, I fine-tune ClimateBERT (4) on the text entailment task") st.markdown("## Try it out!") txt_class = pipeline('text-classification', model='amandakonet/climatebert-fact-checking', use_auth_token = os.environ["hf_token"]) option = st.selectbox('Select a climate claim to test', df['claim'].unique()) st.write('You selected:', option) st.write(type(option)) # section 6: analysis st.markdown("## Critical Analysis") st.markdown("What else could we do?") st.markdown("* Given more data, the performance of the model can be greatly improved. This is just a proof of concept") st.markdown("* This is only one small part of the puzzle!") st.markdown("In the complete pipeline (from user input to final output), we could move from just outputting evidence to training a transformer to reply with persuasive evidence. That is, instead of simply saying, \"This claim is supported by this evidence\", the model could transform the evidence into a persuasive argument, thus combatting climate change misinfo in a more platable and convincing way.") # References + Resource Links st.markdown("## Resource Links") st.markdown("### References") st.markdown("My [huggingface model card](https://huggingface.co/amandakonet/climatebert-fact-checking), [adopted Climate FEVER dataset card](https://huggingface.co/datasets/amandakonet/climate_fever_adopted), and [project code on github](https://github.com/amandakonet/climate-change-misinformation)") st.markdown("1. https://www.carbonbrief.org/guest-post-how-climate-change-misinformation-spreads-online") st.markdown("2. https://www.brookings.edu/research/how-to-combat-fake-news-and-disinformation/") st.markdown("3. Climate FEVER [paper](https://arxiv.org/abs/2012.00614), [huggingface repo](https://huggingface.co/datasets/climate_fever), and [github](https://github.com/huggingface/datasets/tree/master/datasets/climate_fever)") st.markdown("4. [ClimateBERT](https://climatebert.ai/), [paper](https://arxiv.org/abs/2110.12010)")