Update pages/NLP.py
Browse files- pages/NLP.py +1 -1
pages/NLP.py
CHANGED
@@ -16,7 +16,7 @@ def show_page(page):
|
|
16 |
st.title("Text preprocessing")
|
17 |
st.markdown(
|
18 |
"""
|
19 |
-
### Text preprocessing
|
20 |
- **Tokenization**: It breaks down text into smaller units called tokens. These tokens can be words, characters, or punctuation marks. For example, the sentence “I want to learn NLP.” would be tokenized into: I, want, to, learn, NLP,..
|
21 |
- **Stop Words**: Stopwords are words without meaning in the text, such as “is”, “the”, and “and”. Removing these words makes it easier to focus on meaningful words.
|
22 |
- **Stemming**: Stemming strips away suffixes and reduces words to their base form. For example, “going” will be reduced to “go”.
|
|
|
16 |
st.title("Text preprocessing")
|
17 |
st.markdown(
|
18 |
"""
|
19 |
+
### Text preprocessing converts raw data into a suitable format for computer models to understand and process that data. It processes all the data while preserving the actual meaning and context of human language in numbers. This preprocessing is done in multiple steps, but the number of steps can vary depending on the nature of the text and the goals you want to achieve with NLP.
|
20 |
- **Tokenization**: It breaks down text into smaller units called tokens. These tokens can be words, characters, or punctuation marks. For example, the sentence “I want to learn NLP.” would be tokenized into: I, want, to, learn, NLP,..
|
21 |
- **Stop Words**: Stopwords are words without meaning in the text, such as “is”, “the”, and “and”. Removing these words makes it easier to focus on meaningful words.
|
22 |
- **Stemming**: Stemming strips away suffixes and reduces words to their base form. For example, “going” will be reduced to “go”.
|