Update pages/NLP.py
Browse files- pages/NLP.py +4 -2
pages/NLP.py
CHANGED
@@ -16,7 +16,8 @@ def show_page(page):
|
|
16 |
st.title("Text preprocessing")
|
17 |
st.markdown(
|
18 |
"""
|
19 |
-
### Text preprocessing
|
|
|
20 |
- **Tokenization**: It breaks down text into smaller units called tokens. These tokens can be words, characters, or punctuation marks. For example, the sentence “I want to learn NLP.” would be tokenized into: I, want, to, learn, NLP,..
|
21 |
- **Stop Words**: Stopwords are words without meaning in the text, such as “is”, “the”, and “and”. Removing these words makes it easier to focus on meaningful words.
|
22 |
- **Stemming**: Stemming strips away suffixes and reduces words to their base form. For example, “going” will be reduced to “go”.
|
@@ -33,7 +34,8 @@ def show_page(page):
|
|
33 |
st.title("Vectorization")
|
34 |
st.markdown(
|
35 |
"""
|
36 |
-
### Vectorization
|
|
|
37 |
**One Hot Vectorization**:
|
38 |
One-Hot Vectorization is a way to represent words as numbers so that computers can understand them. It works by creating a unique binary vector for each word, where only one position is 1, and all other positions are 0.
|
39 |
#### Example:
|
|
|
16 |
st.title("Text preprocessing")
|
17 |
st.markdown(
|
18 |
"""
|
19 |
+
### Text preprocessing
|
20 |
+
Text preprocessing converts raw data into a suitable format for computer models to understand and process that data. It processes all the data while preserving the actual meaning and context of human language in numbers. This preprocessing is done in multiple steps, but the number of steps can vary depending on the nature of the text and the goals you want to achieve with NLP.
|
21 |
- **Tokenization**: It breaks down text into smaller units called tokens. These tokens can be words, characters, or punctuation marks. For example, the sentence “I want to learn NLP.” would be tokenized into: I, want, to, learn, NLP,..
|
22 |
- **Stop Words**: Stopwords are words without meaning in the text, such as “is”, “the”, and “and”. Removing these words makes it easier to focus on meaningful words.
|
23 |
- **Stemming**: Stemming strips away suffixes and reduces words to their base form. For example, “going” will be reduced to “go”.
|
|
|
34 |
st.title("Vectorization")
|
35 |
st.markdown(
|
36 |
"""
|
37 |
+
### Vectorization
|
38 |
+
Vectorization in NLP is the process of converting text into numbers so that a computer can understand and analyze it. Since machines cannot read words like humans, we need to transform text into a format that they can process—numerical vectors.
|
39 |
**One Hot Vectorization**:
|
40 |
One-Hot Vectorization is a way to represent words as numbers so that computers can understand them. It works by creating a unique binary vector for each word, where only one position is 1, and all other positions are 0.
|
41 |
#### Example:
|