dlsmallw commited on
Commit
cfb32a4
Β·
1 Parent(s): bd63b1f

Task-331 Complete about page documentation

Browse files
Files changed (3) hide show
  1. README.md +26 -0
  2. app.py +56 -24
  3. docs/index.md +26 -0
README.md CHANGED
@@ -14,12 +14,38 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
14
 
15
  # NLPinitiative Streamlit Web Application
16
 
 
 
17
  ---
18
 
19
  ## Project Details
20
 
21
  ### Description
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
24
 
25
  ---
 
14
 
15
  # NLPinitiative Streamlit Web Application
16
 
17
+ Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
18
+
19
  ---
20
 
21
  ## Project Details
22
 
23
  ### Description
24
 
25
+ The NLPinitiative Discriminatory Text Classifier is an advanced natural language processing tool designed to detect and flag potentially discriminatory or harmful language. By analyzing text for biased, offensive, or exclusionary content, this classifier helps promote more inclusive and respectful communication. Simply enter your text below, and the model will assess it based on linguistic patterns and context. While the tool provides valuable insights, we encourage users to review flagged content thoughtfully and consider context when interpreting results.
26
+
27
+ This project was developed as part of a sponsored project for the **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>** for the purpose of detecting discriminatory speech from public officials and news agencies targetting marginalized communities communities.
28
+
29
+ ---
30
+
31
+ ### How The Tool Works
32
+
33
+ The application utilizes two fine-tuned NLP models:
34
+
35
+ - A binary classifier for classifying input as Discriminatory or Non-Discriminatory (prediction classes of 1 and 0 respectively).
36
+ - A multilabel regression model for assessing the likelihood of specific categories of discrimination
37
+ (Gender, Race, Sexuality, Disability, Religion and Unspecified) from a value of 0.0 (no confidence) and 1.0 (max confidence).
38
+
39
+ Both models are use the pretrained **<a href="https://doi.org/10.48550/arXiv.1810.04805" style="text-decoration:none">BERT</a>** (Bidirectional Encoder Representations from Transformers) as the base model, which was trained using the master dataset (which can be viewed on the Datasets tab). The master dataset includes data extractedand reformatted for use in training these models from the **<a href="https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset" style="text-decoration:none">ETHOS dataset</a>** and the **<a href="https://github.com/marcoguerini/CONAN?tab=readme-ov-file#multitarget-conan" style="text-decoration:none">Multitarget-CONAN dataset</a>**.
40
+
41
+ ---
42
+
43
+ ### Project Links
44
+ * **<a href="https://github.com/dlsmallw/NLPinitiative" style="text-decoration:none"><img src="https://raw.githubusercontent.com/tandpfun/skill-icons/refs/heads/main/icons/Github-Dark.svg" style="margin-right: 3px;" width="20" height="20"/> NLPinitiative GitHub Project</a>** - The training/evaluation pipeline used for fine-tuning the models.
45
+ * **<a href="https://huggingface.co/{BIN_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Binary Classification Model Repository</a>** - The Hugging Face hosted Binary Classification Model Repository.
46
+ * **<a href="https://huggingface.co/{ML_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Multilabel Regression Model Repository</a>** - The Hugging Face hosted Multilabel Regression Model Repository.
47
+ * **<a href="https://huggingface.co/{DATASET_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Dataset Repository</a>** - The Hugging Face hosted Dataset Repository.
48
+
49
  Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
50
 
51
  ---
app.py CHANGED
@@ -9,7 +9,12 @@ from annotated_text import annotation
9
  from scripts.predict import InferenceHandler
10
  from huggingface_hub import snapshot_download
11
 
12
- from scripts.config import DATASET_REPO
 
 
 
 
 
13
 
14
  nest_asyncio.apply()
15
  st.set_page_config(layout='wide')
@@ -271,7 +276,7 @@ tab2 = st.empty()
271
  tab4 = st.empty()
272
  tab3 = st.empty()
273
 
274
- tab1, tab2, tab3, tab4 = st.tabs(['Classifier', 'Input History', 'Datasets', 'About This App'])
275
 
276
  if "results" not in st.session_state:
277
  st.session_state.results = []
@@ -289,6 +294,54 @@ with tab1:
289
  analyze_text(text_area)
290
 
291
  with tab2:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
292
  hist_container = st.container(border=True)
293
  try:
294
  load_history(hist_container)
@@ -298,7 +351,7 @@ with tab2:
298
  unsafe_allow_html=True
299
  )
300
 
301
- with tab3:
302
  ds_container = st.container(border=True)
303
  try:
304
  load_datasets(ds_container, API_KEY)
@@ -309,24 +362,3 @@ with tab3:
309
  unsafe_allow_html=True
310
  )
311
 
312
- with tab4:
313
- st.markdown(
314
- f"""
315
- ## About
316
- The NLPinitiative Discriminatory Text Classifier is an advanced
317
- natural language processing tool designed to detect and flag potentially
318
- discriminatory or harmful language. By analyzing text for biased, offensive,
319
- or exclusionary content, this classifier helps promote more inclusive and
320
- respectful communication. Simply enter your text below, and the model will
321
- assess it based on linguistic patterns and context. While the tool provides
322
- valuable insights, we encourage users to review flagged content thoughtfully
323
- and consider context when interpreting results.
324
-
325
- The application utilizes two NLP models: a fine-tuned binary classifier for classifying input as
326
- Discriminatory or Non-Discriminatory and a fine-tuned multilabel regression model for assessing
327
- the likelihood of specific categories of discrimination (Gender, Race, Sexuality, Disability, Religion
328
- and Unspecified). The base model used for both fine-tuned models is the pretrained
329
- [BERT](https://doi.org/10.48550/arXiv.1810.04805) (Bidirectional Encoder Representations from Transformers)
330
- model.
331
- """
332
- )
 
9
  from scripts.predict import InferenceHandler
10
  from huggingface_hub import snapshot_download
11
 
12
+ from scripts.config import (
13
+ BIN_REPO,
14
+ ML_REPO,
15
+ DATASET_REPO
16
+ )
17
+
18
 
19
  nest_asyncio.apply()
20
  st.set_page_config(layout='wide')
 
276
  tab4 = st.empty()
277
  tab3 = st.empty()
278
 
279
+ tab1, tab2, tab3, tab4 = st.tabs(['Classifier', 'About This App', 'Input History', 'Datasets'])
280
 
281
  if "results" not in st.session_state:
282
  st.session_state.results = []
 
294
  analyze_text(text_area)
295
 
296
  with tab2:
297
+ st.markdown(
298
+ f"""
299
+ The NLPinitiative Discriminatory Text Classifier is an advanced
300
+ natural language processing tool designed to detect and flag potentially
301
+ discriminatory or harmful language. By analyzing text for biased, offensive,
302
+ or exclusionary content, this classifier helps promote more inclusive and
303
+ respectful communication. Simply enter your text below, and the model will
304
+ assess it based on linguistic patterns and context. While the tool provides
305
+ valuable insights, we encourage users to review flagged content thoughtfully
306
+ and consider context when interpreting results.
307
+
308
+ This project was developed as part of a sponsored project for the
309
+ **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>** for the purpose of
310
+ detecting discriminatory speech from public officials and news agencies targetting
311
+ marginalized communities communities.
312
+
313
+ <hr style="margin: 0 0 0.5em 0;">
314
+
315
+ ### How The Tool Works
316
+
317
+ The application utilizes two fine-tuned NLP models:
318
+
319
+ - A binary classifier for classifying input as Discriminatory or Non-Discriminatory (prediction classes of 1 and 0 respectively).
320
+ - A multilabel regression model for assessing the likelihood of specific categories of discrimination
321
+ (Gender, Race, Sexuality, Disability, Religion and Unspecified) from a value of 0.0 (no confidence) and 1.0 (max confidence).
322
+
323
+ Both models are use the pretrained **<a href="https://doi.org/10.48550/arXiv.1810.04805" style="text-decoration:none">BERT</a>** (Bidirectional Encoder Representations from Transformers)
324
+ as the base model, which was trained using the master dataset (which can be viewed on the Datasets tab). The master dataset includes data extracted
325
+ and reformatted for use in training these models from the **<a href="https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset" style="text-decoration:none">ETHOS dataset</a>** and
326
+ the **<a href="https://github.com/marcoguerini/CONAN?tab=readme-ov-file#multitarget-conan" style="text-decoration:none">Multitarget-CONAN dataset</a>**.
327
+
328
+ <hr style="margin: 0 0 0.5em 0;">
329
+
330
+ ### Project Links
331
+ * **<a href="https://github.com/dlsmallw/NLPinitiative" style="text-decoration:none"><img src="https://raw.githubusercontent.com/tandpfun/skill-icons/refs/heads/main/icons/Github-Dark.svg" style="margin-right: 3px;" width="20" height="20"/> NLPinitiative GitHub Project</a>** - The training/evaluation pipeline used for fine-tuning the models.
332
+ * **<a href="https://huggingface.co/{BIN_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Binary Classification Model Repository</a>** - The Hugging Face hosted Binary Classification Model Repository.
333
+ * **<a href="https://huggingface.co/{ML_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Multilabel Regression Model Repository</a>** - The Hugging Face hosted Multilabel Regression Model Repository.
334
+ * **<a href="https://huggingface.co/{DATASET_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Dataset Repository</a>** - The Hugging Face hosted Dataset Repository.
335
+
336
+ <hr style="margin: 0 0 0.5em 0;">
337
+
338
+ A tool made by **<a href="mailto:[email protected]" style="text-decoration:none">Dan Smallwood</a>** sponsored by **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>**.
339
+
340
+ """,
341
+ unsafe_allow_html=True
342
+ )
343
+
344
+ with tab3:
345
  hist_container = st.container(border=True)
346
  try:
347
  load_history(hist_container)
 
351
  unsafe_allow_html=True
352
  )
353
 
354
+ with tab4:
355
  ds_container = st.container(border=True)
356
  try:
357
  load_datasets(ds_container, API_KEY)
 
362
  unsafe_allow_html=True
363
  )
364
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/index.md CHANGED
@@ -1,11 +1,37 @@
1
  # NLPinitiative Streamlit Documentation
2
 
 
 
3
  ---
4
 
5
  ## Project Details
6
 
7
  ### Description
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
10
 
11
  ---
 
1
  # NLPinitiative Streamlit Documentation
2
 
3
+ Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
4
+
5
  ---
6
 
7
  ## Project Details
8
 
9
  ### Description
10
 
11
+ The NLPinitiative Discriminatory Text Classifier is an advanced natural language processing tool designed to detect and flag potentially discriminatory or harmful language. By analyzing text for biased, offensive, or exclusionary content, this classifier helps promote more inclusive and respectful communication. Simply enter your text below, and the model will assess it based on linguistic patterns and context. While the tool provides valuable insights, we encourage users to review flagged content thoughtfully and consider context when interpreting results.
12
+
13
+ This project was developed as part of a sponsored project for the **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>** for the purpose of detecting discriminatory speech from public officials and news agencies targetting marginalized communities communities.
14
+
15
+ ---
16
+
17
+ ### How The Tool Works
18
+
19
+ The application utilizes two fine-tuned NLP models:
20
+
21
+ - A binary classifier for classifying input as Discriminatory or Non-Discriminatory (prediction classes of 1 and 0 respectively).
22
+ - A multilabel regression model for assessing the likelihood of specific categories of discrimination
23
+ (Gender, Race, Sexuality, Disability, Religion and Unspecified) from a value of 0.0 (no confidence) and 1.0 (max confidence).
24
+
25
+ Both models are use the pretrained **<a href="https://doi.org/10.48550/arXiv.1810.04805" style="text-decoration:none">BERT</a>** (Bidirectional Encoder Representations from Transformers) as the base model, which was trained using the master dataset (which can be viewed on the Datasets tab). The master dataset includes data extractedand reformatted for use in training these models from the **<a href="https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset" style="text-decoration:none">ETHOS dataset</a>** and the **<a href="https://github.com/marcoguerini/CONAN?tab=readme-ov-file#multitarget-conan" style="text-decoration:none">Multitarget-CONAN dataset</a>**.
26
+
27
+ ---
28
+
29
+ ### Project Links
30
+ * **<a href="https://github.com/dlsmallw/NLPinitiative" style="text-decoration:none"><img src="https://raw.githubusercontent.com/tandpfun/skill-icons/refs/heads/main/icons/Github-Dark.svg" style="margin-right: 3px;" width="20" height="20"/> NLPinitiative GitHub Project</a>** - The training/evaluation pipeline used for fine-tuning the models.
31
+ * **<a href="https://huggingface.co/{BIN_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Binary Classification Model Repository</a>** - The Hugging Face hosted Binary Classification Model Repository.
32
+ * **<a href="https://huggingface.co/{ML_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Multilabel Regression Model Repository</a>** - The Hugging Face hosted Multilabel Regression Model Repository.
33
+ * **<a href="https://huggingface.co/{DATASET_REPO}" style="text-decoration:none">πŸ€— NLPinitiative HF Dataset Repository</a>** - The Hugging Face hosted Dataset Repository.
34
+
35
  Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
36
 
37
  ---