File size: 1,396 Bytes
372ec5d a20a891 372ec5d a20a891 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
license: apache-2.0
datasets:
- AyoubChLin/CNN_News_Articles_2011-2022
language:
- en
tags:
- topic modeling
- BERT
- CNN news articles
---
# BERTopic Model for CNN News Articles
This model is a BERTopic model fine-tuned on CNN news articles. It uses the sentence transformer model "all-MiniLM-L6-v2" to encode the sentences and UMAP for dimensionality reduction.
## Usage
First, install the required packages:
```console
pip install sentence_transformers umap-learn bertopic
```
``` python
Then, load the model and encode your documents:
```python
from sentence_transformers import SentenceTransformer
from umap import UMAP
from bertopic import BERTopic
# Load the sentence transformer model
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
# Set the random state in the UMAP model to prevent stochastic behavior
umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42)
# Load the BERTopic model
my_model = BERTopic.load("from/path/model.bin")
# Encode your documents
document_embeddings = sentence_model.encode(documents)
```
# predict :
```python
sentences = "my sentence"
embeddings = sentence_model.encode([sentences])
topic , _ =my_model.transform([sentences],embeddings)
```
For more information on how to use the BERTopic model, see the (BERTopic documentation)[https://maartengr.github.io/BERTopic/index.html].
|