AngelPanizo commited on
Commit
f28379f
·
verified ·
1 Parent(s): 795bb9c

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_SicilianGorillian2
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_SicilianGorillian2")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 91
34
+ * Number of training documents: 13362
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | biden - migrants - arrested - hamas - texas | 20 | -1_biden_migrants_arrested_hamas |
42
+ | 0 | blacks - racists - naacp - whiteness - melanin | 8570 | 0_blacks_racists_naacp_whiteness |
43
+ | 1 | transgenderism - minors - doctors - detransitioned - hormones | 236 | 1_transgenderism_minors_doctors_detransitioned |
44
+ | 2 | fbi - insurrection - pelosi - january - footage | 178 | 2_fbi_insurrection_pelosi_january |
45
+ | 3 | gunshots - swat - arrests - lewiston - killed | 135 | 3_gunshots_swat_arrests_lewiston |
46
+ | 4 | lampedusa - mussolini - milan - deportations - austria | 130 | 4_lampedusa_mussolini_milan_deportations |
47
+ | 5 | sinaloa - tamaulipas - cartel - zacatecas - sicario | 128 | 5_sinaloa_tamaulipas_cartel_zacatecas |
48
+ | 6 | teachers - lgbtqi - superintendent - indoctrinating - blackface | 113 | 6_teachers_lgbtqi_superintendent_indoctrinating |
49
+ | 7 | epstein - ghislaine - jpmorgan - blackmail - billionaire | 112 | 7_epstein_ghislaine_jpmorgan_blackmail |
50
+ | 8 | massachusetts - mbta - mayor - sanctuary - 7news | 106 | 8_massachusetts_mbta_mayor_sanctuary |
51
+ | 9 | unvaxxed - vaccinate - pfizer - ivermectin - injected | 99 | 9_unvaxxed_vaccinate_pfizer_ivermectin |
52
+ | 10 | petrodollar - currencies - bullion - greenback - yuan | 98 | 10_petrodollar_currencies_bullion_greenback |
53
+ | 11 | fentanyl - narcotics - sinaloa - smuggling - overdosed | 97 | 11_fentanyl_narcotics_sinaloa_smuggling |
54
+ | 12 | exploded - derailment - hazmat - flames - richland | 95 | 12_exploded_derailment_hazmat_flames |
55
+ | 13 | missiles - destroyers - submarine - drone - hormuz | 91 | 13_missiles_destroyers_submarine_drone |
56
+ | 14 | gaza - airstrikes - israeli - rafah - massacres | 89 | 14_gaza_airstrikes_israeli_rafah |
57
+ | 15 | viralnewsnyc - migrant - bronx - midtown - shelters | 89 | 15_viralnewsnyc_migrant_bronx_midtown |
58
+ | 16 | musk - dorsey - tweets - taibbi - shareholders | 82 | 16_musk_dorsey_tweets_taibbi |
59
+ | 17 | cbp - migrants - smugglers - illegally - fy2011 | 73 | 17_cbp_migrants_smugglers_illegally |
60
+ | 18 | kanye - chappelle - banned - zombieland - marilyn | 68 | 18_kanye_chappelle_banned_zombieland |
61
+ | 19 | soros - actblue - philanthropists - donated - fraudulently | 68 | 19_soros_actblue_philanthropists_donated |
62
+ | 20 | ukraine - kakhovka - sevastopol - zaporozhye - missiles | 68 | 20_ukraine_kakhovka_sevastopol_zaporozhye |
63
+ | 21 | illegals - naperville - mayor - pritzker - sanctuary | 67 | 21_illegals_naperville_mayor_pritzker |
64
+ | 22 | greenpeace - methane - junkscience - wheat - hottest | 66 | 22_greenpeace_methane_junkscience_wheat |
65
+ | 23 | patton - chickamauga - paratroopers - lifeofthecivilwar - 1942 | 66 | 23_patton_chickamauga_paratroopers_lifeofthecivilwar |
66
+ | 24 | illegals - panama - border - cartel - treason | 65 | 24_illegals_panama_border_cartel |
67
+ | 25 | migrants - juarez - cbp - crossing - illegally | 65 | 25_migrants_juarez_cbp_crossing |
68
+ | 26 | tucker - carlson - murdoch - shadowbanning - intellectuals | 65 | 26_tucker_carlson_murdoch_shadowbanning |
69
+ | 27 | deepfake - chatgpt - palantir - technology - websites | 62 | 27_deepfake_chatgpt_palantir_technology |
70
+ | 28 | ballots - mandamus - georgia - democrat - tampering | 60 | 28_ballots_mandamus_georgia_democrat |
71
+ | 29 | smuggled - arrested - hidalgo - deputies - txdps | 59 | 29_smuggled_arrested_hidalgo_deputies |
72
+ | 30 | texasattorneygeneral - border - brownsville - governor - jorge | 59 | 30_texasattorneygeneral_border_brownsville_governor |
73
+ | 31 | banks - bailout - blankfein - sivb - insolvent | 57 | 31_banks_bailout_blankfein_sivb |
74
+ | 32 | illegals - biden - amnesty - border - policies | 57 | 32_illegals_biden_amnesty_border |
75
+ | 33 | desantis - trumpism - romney - republican - ronna | 55 | 33_desantis_trumpism_romney_republican |
76
+ | 34 | republicans - pelosi - mccarthy - impeaching - committees | 54 | 34_republicans_pelosi_mccarthy_impeaching |
77
+ | 35 | trump - judge - indictment - blagojevich - viralnewsnyc | 54 | 35_trump_judge_indictment_blagojevich |
78
+ | 36 | drag - hookers - lgbt - beastiality - performs | 53 | 36_drag_hookers_lgbt_beastiality |
79
+ | 37 | biden - teleprompter - gaffe - jokes - gwen | 52 | 37_biden_teleprompter_gaffe_jokes |
80
+ | 38 | riots - parisians - gendarmes - algerian - cherbourg | 50 | 38_riots_parisians_gendarmes_algerian |
81
+ | 39 | ireland - finglas - derry - cathaoirleach - clonmel | 50 | 39_ireland_finglas_derry_cathaoirleach |
82
+ | 40 | pedophiles - pervert - busted - kidnapper - shane | 50 | 40_pedophiles_pervert_busted_kidnapper |
83
+ | 41 | boeing - airliner - pilots - a320 - hartsfield | 49 | 41_boeing_airliner_pilots_a320 |
84
+ | 42 | bidenlaptopreport - bribed - whistleblowers - joe - 140million | 47 | 42_bidenlaptopreport_bribed_whistleblowers_joe |
85
+ | 43 | migrants - border - yuma - smugglers - bisbee | 46 | 43_migrants_border_yuma_smugglers |
86
+ | 44 | manchin - senators - earmarks - moratorium - gulf | 43 | 44_manchin_senators_earmarks_moratorium |
87
+ | 45 | followers - myniggle - tweet - clickbait - stewpetersofficial | 39 | 45_followers_myniggle_tweet_clickbait |
88
+ | 46 | guns - disarmed - albuquerque - governor - rallied | 38 | 46_guns_disarmed_albuquerque_governor |
89
+ | 47 | trudeau - alberta - cbc - _mackenzie - heckled | 37 | 47_trudeau_alberta_cbc__mackenzie |
90
+ | 48 | teslas - megawatt - turbines - renewable - coal | 36 | 48_teslas_megawatt_turbines_renewable |
91
+ | 49 | nypd - robbery - mugger - punched - boardwalk | 36 | 49_nypd_robbery_mugger_punched |
92
+ | 50 | bolsonaro - brasilia - favelas - paulo - janeiro | 36 | 50_bolsonaro_brasilia_favelas_paulo |
93
+ | 51 | wikileaks - extradited - julian - journalists - belmarsh | 35 | 51_wikileaks_extradited_julian_journalists |
94
+ | 52 | gaza - exterminated - instagram - benshapiro - dumbass | 34 | 52_gaza_exterminated_instagram_benshapiro |
95
+ | 53 | trumpo - rushmore - melania - idiots - iceberg | 32 | 53_trumpo_rushmore_melania_idiots |
96
+ | 54 | ufc - mayweather - knockouts - backfist - cocksucker | 32 | 54_ufc_mayweather_knockouts_backfist |
97
+ | 55 | ukrainehumanrightsabuses - zelenskyy - overthrown - dmytro - ww3 | 32 | 55_ukrainehumanrightsabuses_zelenskyy_overthrown_dmytro |
98
+ | 56 | mosques - islamize - cathedral - christianization - mujahid | 32 | 56_mosques_islamize_cathedral_christianization |
99
+ | 57 | maui - hawaiians - waihee - wildfires - landowners | 32 | 57_maui_hawaiians_waihee_wildfires |
100
+ | 58 | afrikaners - ramaphosa - johannesburg - genocide - plaasmoorde | 31 | 58_afrikaners_ramaphosa_johannesburg_genocide |
101
+ | 59 | arrested - felon - molestation - usbp - laredo | 31 | 59_arrested_felon_molestation_usbp |
102
+ | 60 | telegram - channels - evans_baked_telegram - redpilldealer4833 - scammers | 31 | 60_telegram_channels_evans_baked_telegram_redpilldealer4833 |
103
+ | 61 | statues - charlottesville - tubman - defaced - removed | 31 | 61_statues_charlottesville_tubman_defaced |
104
+ | 62 | disney - bambi - blackwash - mermaid - reimagined | 30 | 62_disney_bambi_blackwash_mermaid |
105
+ | 63 | antisemite - splc - mossad - greenblatt - blacklisted | 30 | 63_antisemite_splc_mossad_greenblatt |
106
+ | 64 | migrants - reynosa - expires - haitian - title | 29 | 64_migrants_reynosa_expires_haitian |
107
+ | 65 | carjackers - stolen - robbers - oakland - kia | 28 | 65_carjackers_stolen_robbers_oakland |
108
+ | 66 | georgia - prosecutor - fulton - willis - subpoena | 27 | 66_georgia_prosecutor_fulton_willis |
109
+ | 67 | women - transgender - serena - powerlifters - navratilova | 27 | 67_women_transgender_serena_powerlifters |
110
+ | 68 | bidenflation - cpi - krugman - deficits - reduction | 27 | 68_bidenflation_cpi_krugman_deficits |
111
+ | 69 | illegals - taxpayers - billion - bankrupting - dreamers | 27 | 69_illegals_taxpayers_billion_bankrupting |
112
+ | 70 | mayor - nypd - adams - illegals - tenants | 26 | 70_mayor_nypd_adams_illegals |
113
+ | 71 | deficits - hyperinflation - treasury - trillion - yellen | 26 | 71_deficits_hyperinflation_treasury_trillion |
114
+ | 72 | army - enlisting - milley - recruited - rifleman | 26 | 72_army_enlisting_milley_recruited |
115
+ | 73 | prosecutors - plea - gun - misdemeanors - delaware | 26 | 73_prosecutors_plea_gun_misdemeanors |
116
+ | 74 | abortions - satanic - lujan - baals - ab2223 | 25 | 74_abortions_satanic_lujan_baals |
117
+ | 75 | died - stroke - defibrillators - henrique - jaguars | 25 | 75_died_stroke_defibrillators_henrique |
118
+ | 76 | shootings - transgender - nashville - martyr - aiden | 25 | 76_shootings_transgender_nashville_martyr |
119
+ | 77 | holocaust - treblinka - hitler - mengele - firebombed | 25 | 77_holocaust_treblinka_hitler_mengele |
120
+ | 78 | ftx - bankman - zuckerberg - laundering - scandal | 25 | 78_ftx_bankman_zuckerberg_laundering |
121
+ | 79 | whales - giraffes - alligator - rescued - manasquan | 24 | 79_whales_giraffes_alligator_rescued |
122
+ | 80 | jesus - freedom - rescuing - starring - traffickers | 24 | 80_jesus_freedom_rescuing_starring |
123
+ | 81 | antisemite - populism - roseanne - leibowitz - jevvs | 22 | 81_antisemite_populism_roseanne_leibowitz |
124
+ | 82 | aipac - zionism - netanyahu - mearsheimer - influencing | 22 | 82_aipac_zionism_netanyahu_mearsheimer |
125
+ | 83 | farmers - vlaardingerbroek - spain - protests - blockade | 22 | 83_farmers_vlaardingerbroek_spain_protests |
126
+ | 84 | resilience - blackpilled - hopelessness - rebel - rejoice | 21 | 84_resilience_blackpilled_hopelessness_rebel |
127
+ | 85 | tiktok - banning - chinafication - ceo - totalitarianism | 21 | 85_tiktok_banning_chinafication_ceo |
128
+ | 86 | blackrock - megacorp - larryfink - trillions - privatized | 21 | 86_blackrock_megacorp_larryfink_trillions |
129
+ | 87 | censorship - misinformation - reclaimthenet - obama - unconstitutionally | 20 | 87_censorship_misinformation_reclaimthenet_obama |
130
+ | 88 | budweiser - mulvaney - drinkers - sponsors - boycott | 20 | 88_budweiser_mulvaney_drinkers_sponsors |
131
+ | 89 | boston - children - firefighters - underreported - corpse | 20 | 89_boston_children_firefighters_underreported |
132
+
133
+ </details>
134
+
135
+ ## Training hyperparameters
136
+
137
+ * calculate_probabilities: True
138
+ * language: None
139
+ * low_memory: False
140
+ * min_topic_size: 10
141
+ * n_gram_range: (1, 1)
142
+ * nr_topics: None
143
+ * seed_topic_list: None
144
+ * top_n_words: 10
145
+ * verbose: False
146
+ * zeroshot_min_similarity: 0.7
147
+ * zeroshot_topic_list: None
148
+
149
+ ## Framework versions
150
+
151
+ * Numpy: 1.26.4
152
+ * HDBSCAN: 0.8.40
153
+ * UMAP: 0.5.7
154
+ * Pandas: 2.2.3
155
+ * Scikit-Learn: 1.5.2
156
+ * Sentence-transformers: 3.3.1
157
+ * Transformers: 4.46.3
158
+ * Numba: 0.60.0
159
+ * Plotly: 5.24.1
160
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:844ad2c71fcee7b9a8ed34abc5bffad24ad7bbba36706fa6fe25d2731bd4aaa9
3
+ size 1299872
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de760b05833b62eb4c1bfeabde9e9b5a7d011e32bc2e32b544dae0e4da67f86f
3
+ size 372824
topics.json ADDED
The diff for this file is too large to render. See raw diff