DeDeckerThomas
commited on
Commit
·
3a6fbf2
1
Parent(s):
1b3bfc8
Update README.md
Browse files
README.md
CHANGED
@@ -85,18 +85,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
|
|
85 |
|
86 |
```python
|
87 |
# Load pipeline
|
88 |
-
model_name = "
|
89 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
90 |
```
|
91 |
```python
|
92 |
# Inference
|
93 |
text = """
|
94 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
95 |
-
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
96 |
-
Currently, classical machine learning methods, that use statistics and linguistics,
|
97 |
-
The fact that these methods have been widely used in the community
|
98 |
-
|
99 |
-
|
|
|
100 |
""".replace(
|
101 |
"\n", ""
|
102 |
)
|
@@ -108,14 +109,18 @@ print(keyphrases)
|
|
108 |
|
109 |
```
|
110 |
# Output
|
111 |
-
['Artificial Intelligence' '
|
112 |
-
'
|
113 |
-
'
|
114 |
-
'
|
|
|
|
|
|
|
|
|
115 |
```
|
116 |
|
117 |
## 📚 Training Dataset
|
118 |
-
KPCrowd is a
|
119 |
|
120 |
You can find more information here: https://huggingface.co/datasets/midas/kpcrowd and https://github.com/LIAAD/KeywordExtractor-Datasets.
|
121 |
|
@@ -218,4 +223,4 @@ The model achieves the following results on the Inspec test set:
|
|
218 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
219 |
|
220 |
## 🚨 Issues
|
221 |
-
Please feel free to
|
|
|
85 |
|
86 |
```python
|
87 |
# Load pipeline
|
88 |
+
model_name = "ml6team/keyphrase-extraction-kbir-kpcrowd"
|
89 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
90 |
```
|
91 |
```python
|
92 |
# Inference
|
93 |
text = """
|
94 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
95 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
96 |
+
Currently, classical machine learning methods, that use statistics and linguistics,
|
97 |
+
are widely used for the extraction process. The fact that these methods have been widely used in the community
|
98 |
+
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
|
99 |
+
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
|
100 |
+
and context of a document, which is quite an improvement.
|
101 |
""".replace(
|
102 |
"\n", ""
|
103 |
)
|
|
|
109 |
|
110 |
```
|
111 |
# Output
|
112 |
+
['Artificial Intelligence', 'Keyphrase extraction', 'NLP',
|
113 |
+
'Transformers also', 'advantage', 'automate',
|
114 |
+
'classical machine learning', 'community', 'context', 'document',
|
115 |
+
'extract', 'extraction', 'extraction process', 'focus',
|
116 |
+
'important', 'improvement', 'innovations', 'keyphrase',
|
117 |
+
'keyphrases', 'libraries', 'linguistics', 'methods', 'process',
|
118 |
+
'recent', 'semantics', 'statistics', 'technique', 'text',
|
119 |
+
'text analysis', 'time-consuming', 'transformers', 'widely']
|
120 |
```
|
121 |
|
122 |
## 📚 Training Dataset
|
123 |
+
KPCrowd is a broadcast news transcription dataset consisting of 500 English broadcast news stories from 10 different categories (art and culture, business, crime, fashion, health, politics us, politics world, science, sports, technology) with 50 docs per category. This dataset is annotated by multiple annotators that were required to look at the same news story and assign a set of keyphrases from the text itself.
|
124 |
|
125 |
You can find more information here: https://huggingface.co/datasets/midas/kpcrowd and https://github.com/LIAAD/KeywordExtractor-Datasets.
|
126 |
|
|
|
223 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
224 |
|
225 |
## 🚨 Issues
|
226 |
+
Please feel free to start discussions in the Community Tab.
|