Task-331 Complete about page documentation
Browse files- README.md +26 -0
- app.py +56 -24
- docs/index.md +26 -0
README.md
CHANGED
@@ -14,12 +14,38 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
14 |
|
15 |
# NLPinitiative Streamlit Web Application
|
16 |
|
|
|
|
|
17 |
---
|
18 |
|
19 |
## Project Details
|
20 |
|
21 |
### Description
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
|
24 |
|
25 |
---
|
|
|
14 |
|
15 |
# NLPinitiative Streamlit Web Application
|
16 |
|
17 |
+
Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
|
18 |
+
|
19 |
---
|
20 |
|
21 |
## Project Details
|
22 |
|
23 |
### Description
|
24 |
|
25 |
+
The NLPinitiative Discriminatory Text Classifier is an advanced natural language processing tool designed to detect and flag potentially discriminatory or harmful language. By analyzing text for biased, offensive, or exclusionary content, this classifier helps promote more inclusive and respectful communication. Simply enter your text below, and the model will assess it based on linguistic patterns and context. While the tool provides valuable insights, we encourage users to review flagged content thoughtfully and consider context when interpreting results.
|
26 |
+
|
27 |
+
This project was developed as part of a sponsored project for the **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>** for the purpose of detecting discriminatory speech from public officials and news agencies targetting marginalized communities communities.
|
28 |
+
|
29 |
+
---
|
30 |
+
|
31 |
+
### How The Tool Works
|
32 |
+
|
33 |
+
The application utilizes two fine-tuned NLP models:
|
34 |
+
|
35 |
+
- A binary classifier for classifying input as Discriminatory or Non-Discriminatory (prediction classes of 1 and 0 respectively).
|
36 |
+
- A multilabel regression model for assessing the likelihood of specific categories of discrimination
|
37 |
+
(Gender, Race, Sexuality, Disability, Religion and Unspecified) from a value of 0.0 (no confidence) and 1.0 (max confidence).
|
38 |
+
|
39 |
+
Both models are use the pretrained **<a href="https://doi.org/10.48550/arXiv.1810.04805" style="text-decoration:none">BERT</a>** (Bidirectional Encoder Representations from Transformers) as the base model, which was trained using the master dataset (which can be viewed on the Datasets tab). The master dataset includes data extractedand reformatted for use in training these models from the **<a href="https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset" style="text-decoration:none">ETHOS dataset</a>** and the **<a href="https://github.com/marcoguerini/CONAN?tab=readme-ov-file#multitarget-conan" style="text-decoration:none">Multitarget-CONAN dataset</a>**.
|
40 |
+
|
41 |
+
---
|
42 |
+
|
43 |
+
### Project Links
|
44 |
+
* **<a href="https://github.com/dlsmallw/NLPinitiative" style="text-decoration:none"><img src="https://raw.githubusercontent.com/tandpfun/skill-icons/refs/heads/main/icons/Github-Dark.svg" style="margin-right: 3px;" width="20" height="20"/> NLPinitiative GitHub Project</a>** - The training/evaluation pipeline used for fine-tuning the models.
|
45 |
+
* **<a href="https://huggingface.co/{BIN_REPO}" style="text-decoration:none">π€ NLPinitiative HF Binary Classification Model Repository</a>** - The Hugging Face hosted Binary Classification Model Repository.
|
46 |
+
* **<a href="https://huggingface.co/{ML_REPO}" style="text-decoration:none">π€ NLPinitiative HF Multilabel Regression Model Repository</a>** - The Hugging Face hosted Multilabel Regression Model Repository.
|
47 |
+
* **<a href="https://huggingface.co/{DATASET_REPO}" style="text-decoration:none">π€ NLPinitiative HF Dataset Repository</a>** - The Hugging Face hosted Dataset Repository.
|
48 |
+
|
49 |
Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
|
50 |
|
51 |
---
|
app.py
CHANGED
@@ -9,7 +9,12 @@ from annotated_text import annotation
|
|
9 |
from scripts.predict import InferenceHandler
|
10 |
from huggingface_hub import snapshot_download
|
11 |
|
12 |
-
from scripts.config import
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
nest_asyncio.apply()
|
15 |
st.set_page_config(layout='wide')
|
@@ -271,7 +276,7 @@ tab2 = st.empty()
|
|
271 |
tab4 = st.empty()
|
272 |
tab3 = st.empty()
|
273 |
|
274 |
-
tab1, tab2, tab3, tab4 = st.tabs(['Classifier', '
|
275 |
|
276 |
if "results" not in st.session_state:
|
277 |
st.session_state.results = []
|
@@ -289,6 +294,54 @@ with tab1:
|
|
289 |
analyze_text(text_area)
|
290 |
|
291 |
with tab2:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
292 |
hist_container = st.container(border=True)
|
293 |
try:
|
294 |
load_history(hist_container)
|
@@ -298,7 +351,7 @@ with tab2:
|
|
298 |
unsafe_allow_html=True
|
299 |
)
|
300 |
|
301 |
-
with
|
302 |
ds_container = st.container(border=True)
|
303 |
try:
|
304 |
load_datasets(ds_container, API_KEY)
|
@@ -309,24 +362,3 @@ with tab3:
|
|
309 |
unsafe_allow_html=True
|
310 |
)
|
311 |
|
312 |
-
with tab4:
|
313 |
-
st.markdown(
|
314 |
-
f"""
|
315 |
-
## About
|
316 |
-
The NLPinitiative Discriminatory Text Classifier is an advanced
|
317 |
-
natural language processing tool designed to detect and flag potentially
|
318 |
-
discriminatory or harmful language. By analyzing text for biased, offensive,
|
319 |
-
or exclusionary content, this classifier helps promote more inclusive and
|
320 |
-
respectful communication. Simply enter your text below, and the model will
|
321 |
-
assess it based on linguistic patterns and context. While the tool provides
|
322 |
-
valuable insights, we encourage users to review flagged content thoughtfully
|
323 |
-
and consider context when interpreting results.
|
324 |
-
|
325 |
-
The application utilizes two NLP models: a fine-tuned binary classifier for classifying input as
|
326 |
-
Discriminatory or Non-Discriminatory and a fine-tuned multilabel regression model for assessing
|
327 |
-
the likelihood of specific categories of discrimination (Gender, Race, Sexuality, Disability, Religion
|
328 |
-
and Unspecified). The base model used for both fine-tuned models is the pretrained
|
329 |
-
[BERT](https://doi.org/10.48550/arXiv.1810.04805) (Bidirectional Encoder Representations from Transformers)
|
330 |
-
model.
|
331 |
-
"""
|
332 |
-
)
|
|
|
9 |
from scripts.predict import InferenceHandler
|
10 |
from huggingface_hub import snapshot_download
|
11 |
|
12 |
+
from scripts.config import (
|
13 |
+
BIN_REPO,
|
14 |
+
ML_REPO,
|
15 |
+
DATASET_REPO
|
16 |
+
)
|
17 |
+
|
18 |
|
19 |
nest_asyncio.apply()
|
20 |
st.set_page_config(layout='wide')
|
|
|
276 |
tab4 = st.empty()
|
277 |
tab3 = st.empty()
|
278 |
|
279 |
+
tab1, tab2, tab3, tab4 = st.tabs(['Classifier', 'About This App', 'Input History', 'Datasets'])
|
280 |
|
281 |
if "results" not in st.session_state:
|
282 |
st.session_state.results = []
|
|
|
294 |
analyze_text(text_area)
|
295 |
|
296 |
with tab2:
|
297 |
+
st.markdown(
|
298 |
+
f"""
|
299 |
+
The NLPinitiative Discriminatory Text Classifier is an advanced
|
300 |
+
natural language processing tool designed to detect and flag potentially
|
301 |
+
discriminatory or harmful language. By analyzing text for biased, offensive,
|
302 |
+
or exclusionary content, this classifier helps promote more inclusive and
|
303 |
+
respectful communication. Simply enter your text below, and the model will
|
304 |
+
assess it based on linguistic patterns and context. While the tool provides
|
305 |
+
valuable insights, we encourage users to review flagged content thoughtfully
|
306 |
+
and consider context when interpreting results.
|
307 |
+
|
308 |
+
This project was developed as part of a sponsored project for the
|
309 |
+
**<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>** for the purpose of
|
310 |
+
detecting discriminatory speech from public officials and news agencies targetting
|
311 |
+
marginalized communities communities.
|
312 |
+
|
313 |
+
<hr style="margin: 0 0 0.5em 0;">
|
314 |
+
|
315 |
+
### How The Tool Works
|
316 |
+
|
317 |
+
The application utilizes two fine-tuned NLP models:
|
318 |
+
|
319 |
+
- A binary classifier for classifying input as Discriminatory or Non-Discriminatory (prediction classes of 1 and 0 respectively).
|
320 |
+
- A multilabel regression model for assessing the likelihood of specific categories of discrimination
|
321 |
+
(Gender, Race, Sexuality, Disability, Religion and Unspecified) from a value of 0.0 (no confidence) and 1.0 (max confidence).
|
322 |
+
|
323 |
+
Both models are use the pretrained **<a href="https://doi.org/10.48550/arXiv.1810.04805" style="text-decoration:none">BERT</a>** (Bidirectional Encoder Representations from Transformers)
|
324 |
+
as the base model, which was trained using the master dataset (which can be viewed on the Datasets tab). The master dataset includes data extracted
|
325 |
+
and reformatted for use in training these models from the **<a href="https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset" style="text-decoration:none">ETHOS dataset</a>** and
|
326 |
+
the **<a href="https://github.com/marcoguerini/CONAN?tab=readme-ov-file#multitarget-conan" style="text-decoration:none">Multitarget-CONAN dataset</a>**.
|
327 |
+
|
328 |
+
<hr style="margin: 0 0 0.5em 0;">
|
329 |
+
|
330 |
+
### Project Links
|
331 |
+
* **<a href="https://github.com/dlsmallw/NLPinitiative" style="text-decoration:none"><img src="https://raw.githubusercontent.com/tandpfun/skill-icons/refs/heads/main/icons/Github-Dark.svg" style="margin-right: 3px;" width="20" height="20"/> NLPinitiative GitHub Project</a>** - The training/evaluation pipeline used for fine-tuning the models.
|
332 |
+
* **<a href="https://huggingface.co/{BIN_REPO}" style="text-decoration:none">π€ NLPinitiative HF Binary Classification Model Repository</a>** - The Hugging Face hosted Binary Classification Model Repository.
|
333 |
+
* **<a href="https://huggingface.co/{ML_REPO}" style="text-decoration:none">π€ NLPinitiative HF Multilabel Regression Model Repository</a>** - The Hugging Face hosted Multilabel Regression Model Repository.
|
334 |
+
* **<a href="https://huggingface.co/{DATASET_REPO}" style="text-decoration:none">π€ NLPinitiative HF Dataset Repository</a>** - The Hugging Face hosted Dataset Repository.
|
335 |
+
|
336 |
+
<hr style="margin: 0 0 0.5em 0;">
|
337 |
+
|
338 |
+
A tool made by **<a href="mailto:[email protected]" style="text-decoration:none">Dan Smallwood</a>** sponsored by **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>**.
|
339 |
+
|
340 |
+
""",
|
341 |
+
unsafe_allow_html=True
|
342 |
+
)
|
343 |
+
|
344 |
+
with tab3:
|
345 |
hist_container = st.container(border=True)
|
346 |
try:
|
347 |
load_history(hist_container)
|
|
|
351 |
unsafe_allow_html=True
|
352 |
)
|
353 |
|
354 |
+
with tab4:
|
355 |
ds_container = st.container(border=True)
|
356 |
try:
|
357 |
load_datasets(ds_container, API_KEY)
|
|
|
362 |
unsafe_allow_html=True
|
363 |
)
|
364 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/index.md
CHANGED
@@ -1,11 +1,37 @@
|
|
1 |
# NLPinitiative Streamlit Documentation
|
2 |
|
|
|
|
|
3 |
---
|
4 |
|
5 |
## Project Details
|
6 |
|
7 |
### Description
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
|
10 |
|
11 |
---
|
|
|
1 |
# NLPinitiative Streamlit Documentation
|
2 |
|
3 |
+
Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
|
4 |
+
|
5 |
---
|
6 |
|
7 |
## Project Details
|
8 |
|
9 |
### Description
|
10 |
|
11 |
+
The NLPinitiative Discriminatory Text Classifier is an advanced natural language processing tool designed to detect and flag potentially discriminatory or harmful language. By analyzing text for biased, offensive, or exclusionary content, this classifier helps promote more inclusive and respectful communication. Simply enter your text below, and the model will assess it based on linguistic patterns and context. While the tool provides valuable insights, we encourage users to review flagged content thoughtfully and consider context when interpreting results.
|
12 |
+
|
13 |
+
This project was developed as part of a sponsored project for the **<a href="https://www.j-initiative.org/" style="text-decoration:none">The J-Healthcare Initiative</a>** for the purpose of detecting discriminatory speech from public officials and news agencies targetting marginalized communities communities.
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
+
### How The Tool Works
|
18 |
+
|
19 |
+
The application utilizes two fine-tuned NLP models:
|
20 |
+
|
21 |
+
- A binary classifier for classifying input as Discriminatory or Non-Discriminatory (prediction classes of 1 and 0 respectively).
|
22 |
+
- A multilabel regression model for assessing the likelihood of specific categories of discrimination
|
23 |
+
(Gender, Race, Sexuality, Disability, Religion and Unspecified) from a value of 0.0 (no confidence) and 1.0 (max confidence).
|
24 |
+
|
25 |
+
Both models are use the pretrained **<a href="https://doi.org/10.48550/arXiv.1810.04805" style="text-decoration:none">BERT</a>** (Bidirectional Encoder Representations from Transformers) as the base model, which was trained using the master dataset (which can be viewed on the Datasets tab). The master dataset includes data extractedand reformatted for use in training these models from the **<a href="https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset" style="text-decoration:none">ETHOS dataset</a>** and the **<a href="https://github.com/marcoguerini/CONAN?tab=readme-ov-file#multitarget-conan" style="text-decoration:none">Multitarget-CONAN dataset</a>**.
|
26 |
+
|
27 |
+
---
|
28 |
+
|
29 |
+
### Project Links
|
30 |
+
* **<a href="https://github.com/dlsmallw/NLPinitiative" style="text-decoration:none"><img src="https://raw.githubusercontent.com/tandpfun/skill-icons/refs/heads/main/icons/Github-Dark.svg" style="margin-right: 3px;" width="20" height="20"/> NLPinitiative GitHub Project</a>** - The training/evaluation pipeline used for fine-tuning the models.
|
31 |
+
* **<a href="https://huggingface.co/{BIN_REPO}" style="text-decoration:none">π€ NLPinitiative HF Binary Classification Model Repository</a>** - The Hugging Face hosted Binary Classification Model Repository.
|
32 |
+
* **<a href="https://huggingface.co/{ML_REPO}" style="text-decoration:none">π€ NLPinitiative HF Multilabel Regression Model Repository</a>** - The Hugging Face hosted Multilabel Regression Model Repository.
|
33 |
+
* **<a href="https://huggingface.co/{DATASET_REPO}" style="text-decoration:none">π€ NLPinitiative HF Dataset Repository</a>** - The Hugging Face hosted Dataset Repository.
|
34 |
+
|
35 |
Codebase for the Streamlit app hosted on Hugging Face Spaces that provides a basic user interface for performing inference on text input by the user using the models training within the NLPinitiative project.
|
36 |
|
37 |
---
|