DeDeckerThomas commited on
Commit
5032a92
·
1 Parent(s): 63d88a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -32,7 +32,7 @@ Keyphrase extraction is a technique in text analysis where you extract the impor
32
 
33
 
34
  ## 📓 Model Description
35
- This model is a fine-tuned distilbert model on the openkp dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
36
 
37
  The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
38
 
@@ -79,18 +79,20 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
79
 
80
  ```python
81
  # Load pipeline
82
- model_name = "DeDeckerThomas/keyphrase-extraction-distilbert-openkp"
83
  extractor = KeyphraseExtractionPipeline(model=model_name)
84
  ```
 
85
  ```python
86
  # Inference
87
  text = """
88
  Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
89
- Since this is a time-consuming process, Artificial Intelligence is used to automate it.
90
- Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
91
- The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
92
- Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …),
93
- keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement.
 
94
  """.replace(
95
  "\n", ""
96
  )
@@ -102,10 +104,7 @@ print(keyphrases)
102
 
103
  ```
104
  # Output
105
- ['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
106
- 'classical machine learning' 'deep learning methods'
107
- 'keyphrase extraction' 'linguistics' 'recurrent neural networks'
108
- 'semantics' 'statistics' 'text analysis' 'transformers']
109
  ```
110
 
111
  ## 📚 Training Dataset
@@ -163,7 +162,7 @@ def preprocess_fuction(all_samples_per_split):
163
  ```
164
 
165
  ### Postprocessing
166
- For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive B and Is. As last you strip the keyphrase to ensure all spaces are removed.
167
  ```python
168
  # Define post_process functions
169
  def concat_tokens_by_tag(keyphrases):
@@ -207,4 +206,4 @@ The model achieves the following results on the OpenKP test set:
207
  For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
208
 
209
  ## 🚨 Issues
210
- Please feel free to contact Thomas De Decker for any problems with this model.
 
32
 
33
 
34
  ## 📓 Model Description
35
+ This model is a fine-tuned distilbert model on the OpenKP dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
36
 
37
  The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
38
 
 
79
 
80
  ```python
81
  # Load pipeline
82
+ model_name = "ml6team/keyphrase-extraction-distilbert-openkp"
83
  extractor = KeyphraseExtractionPipeline(model=model_name)
84
  ```
85
+
86
  ```python
87
  # Inference
88
  text = """
89
  Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
90
+ Since this is a time-consuming process, Artificial Intelligence is used to automate it.
91
+ Currently, classical machine learning methods, that use statistics and linguistics,
92
+ are widely used for the extraction process. The fact that these methods have been widely used in the community
93
+ has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
94
+ transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
95
+ and context of a document, which is quite an improvement.
96
  """.replace(
97
  "\n", ""
98
  )
 
104
 
105
  ```
106
  # Output
107
+ ['keyphrase extraction', 'text analysis']
 
 
 
108
  ```
109
 
110
  ## 📚 Training Dataset
 
162
  ```
163
 
164
  ### Postprocessing
165
+ For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
166
  ```python
167
  # Define post_process functions
168
  def concat_tokens_by_tag(keyphrases):
 
206
  For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
207
 
208
  ## 🚨 Issues
209
+ Please feel free to start discussions in the Community Tab.