anakin87 commited on
Commit
dbd4f9e
Β·
1 Parent(s): 28c4c1d

add presentation/slides

Browse files
Files changed (2) hide show
  1. README.md +10 -1
  2. presentation/fact_checking_rocks.pdf +3 -0
README.md CHANGED
@@ -19,9 +19,11 @@ license: apache-2.0
19
  - [Fact Checking 🎸 Rocks! Β  ](#fact-checking--rocks---)
20
  - [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment)
21
  - [Idea](#idea)
 
22
  - [System description](#system-description)
23
  - [Indexing pipeline](#indexing-pipeline)
24
  - [Search pipeline](#search-pipeline)
 
25
  - [Limits and possible improvements](#limits-and-possible-improvements)
26
  - [Repository structure](#repository-structure)
27
  - [Installation](#installation)
@@ -34,10 +36,14 @@ In a nutshell, the flow is as follows:
34
  * the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
35
  * the entailment scores are aggregated to produce a summary score.
36
 
 
 
 
 
 
37
  ### System description
38
  πŸͺ„ This project is strongly based on [πŸ”Ž Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.
39
 
40
-
41
  #### Indexing pipeline
42
  * [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
43
  * [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb)
@@ -56,6 +62,9 @@ In a nutshell, the flow is as follows:
56
  * aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
57
  * *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
58
 
 
 
 
59
  ### Limits and possible improvements
60
  ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
61
  * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
 
19
  - [Fact Checking 🎸 Rocks! Β  ](#fact-checking--rocks---)
20
  - [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment)
21
  - [Idea](#idea)
22
+ - [Presentation](#presentation)
23
  - [System description](#system-description)
24
  - [Indexing pipeline](#indexing-pipeline)
25
  - [Search pipeline](#search-pipeline)
26
+ - [Explain using a LLM](#explain-using-a-llm)
27
  - [Limits and possible improvements](#limits-and-possible-improvements)
28
  - [Repository structure](#repository-structure)
29
  - [Installation](#installation)
 
36
  * the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
37
  * the entailment scores are aggregated to produce a summary score.
38
 
39
+ ### Presentation
40
+
41
+ - [🍿 Video presentation @ Berlin Buzzwords 2023](https://www.youtube.com/watch?v=4L8Iw9CZNbU)
42
+ - [πŸ§‘β€πŸ« Slides](./presentation/fact_checking_rocks.pdf)
43
+
44
  ### System description
45
  πŸͺ„ This project is strongly based on [πŸ”Ž Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.
46
 
 
47
  #### Indexing pipeline
48
  * [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
49
  * [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb)
 
62
  * aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
63
  * *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
64
 
65
+ #### Explain using a LLM
66
+ * if there is entailment or contradiction, prompt `google/flan-t5-large`, asking why the relevant textual passages entail/contradict the user statement.
67
+
68
  ### Limits and possible improvements
69
  ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
70
  * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
presentation/fact_checking_rocks.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de44af8827f3f36648726176d51b09a009528b9168dd0cdef9c4a687ad62247f
3
+ size 2737149