label_model / README.md
davanstrien's picture
davanstrien HF staff
Add BERTopic model
3637239
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

label_model

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/label_model")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 252
  • Number of training documents: 14986
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 date - city - pre - heavy - fur 5 -1_date_city_pre_heavy
0 label_1 label_2 - label_0 label_1 label_2 - label_0 label_1 - label_1 - label_2 1333 0_label_1 label_2_label_0 label_1 label_2_label_0 label_1_label_1
1 label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 1043 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4
2 negative positive - positive negative - negative - positive - target 803 2_negative positive_positive negative_negative_positive
3 loc misc org - loc misc - misc org - misc - org loc 651 3_loc misc org_loc misc_misc org_misc
4 neutral positive - neutral - positive negative - negative - positive 479 4_neutral positive_neutral_positive negative_negative
5 label_0 - - - - 357 5_label_0___
6 contradiction - entailment - neutral - ambiguous - 348 6_contradiction_entailment_neutral_ambiguous
7 label_0 - - - - 334 7_label_0___
8 99 - - - - 326 8_99___
9 label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 300 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3
10 entailment - true - child - related - non 257 10_entailment_true_child_related
11 snake - dog - bear - wolf - sea 245 11_snake_dog_bear_wolf
12 label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_6 label_7 label_8 241 12_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6
13 loc misc org - loc misc - misc org - misc - org loc 229 13_loc misc org_loc misc_misc org_misc
14 weather - transfer - alarm - text - time 228 14_weather_transfer_alarm_text
15 label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 222 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2
16 delete - different - bad - related - rel 207 16_delete_different_bad_related
17 label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 172 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13
18 - - - - 166 18____
19 loc org loc - loc org - org loc - org - loc 142 19_loc org loc_loc org_org loc_org
20 label_6 label_60 label_61 - label_60 label_61 - label_62 label_63 - label_61 label_62 label_63 - label_61 label_62 126 20_label_6 label_60 label_61_label_60 label_61_label_62 label_63_label_61 label_62 label_63
21 label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 117 21_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3
22 test - second - - - 106 22_test_second__
23 forest - industrial - transport - low - bamboo 104 23_forest_industrial_transport_low
24 answer - header - question - quantity - 104 24_answer_header_question_quantity
25 healthy - leaf - rust - plant - spot 103 25_healthy_leaf_rust_plant
26 left - right - stop - yes - unknown 100 26_left_right_stop_yes
27 en - na - alpha - fan - lifestyle 93 27_en_na_alpha_fan
28 label_13 label_14 label_15 - label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 92 28_label_13 label_14 label_15_label_14 label_15_label_15_label_12 label_13 label_14
29 disease - bio - disorder - healthy - 86 29_disease_bio_disorder_healthy
30 work - group - person product - product - location 86 30_work_group_person product_product
31 fear joy - sadness surprise - anger fear - joy love - surprise 82 31_fear joy_sadness surprise_anger fear_joy love
32 common - non - different - - 78 32_common_non_different_
33 dis - - - - 76 33_dis___
34 - - - - 73 34____
35 restaurant - pizza - place - salad - food 69 35_restaurant_pizza_place_salad
36 cconj det intj - adj adp adv - det intj noun - det intj - noun num pron 66 36_cconj det intj_adj adp adv_det intj noun_det intj
37 label_17 label_18 label_19 - label_18 label_19 label_2 - label_18 label_19 - label_19 label_2 - label_16 label_17 label_18 66 37_label_17 label_18 label_19_label_18 label_19 label_2_label_18 label_19_label_19 label_2
38 ll - year - related - cause - delete 65 38_ll_year_related_cause
39 anger fear - joy love - surprise - joy - love 64 39_anger fear_joy love_surprise_joy
40 true - news - partial - - 64 40_true_news_partial_
41 - - - - 63 41____
42 label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 62 42_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_7 label_8 label_9
43 pos - neg - - - 62 43_pos_neg__
44 loc org - org - loc - date - sex 61 44_loc org_org_loc_date
45 label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 60 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2
46 event - group - person product - product - location 57 46_event_group_person product_product
47 bio - chemical - disease - effect - food 57 47_bio_chemical_disease_effect
48 234 - 19 20 21 - 20 21 22 - 22 23 24 - 23 24 57 48_234_19 20 21_20 21 22_22 23 24
49 fear happy neutral - happy neutral - fear happy - sad - happy 53 49_fear happy neutral_happy neutral_fear happy_sad
50 battery - volume - juice - chinese - korean 53 50_battery_volume_juice_chinese
51 menu - price - num - - 52 51_menu_price_num_
52 poor - ok - good - bad - great 52 52_poor_ok_good_bad
53 ll - cause - delete - unknown - 51 53_ll_cause_delete_unknown
54 hospital - unknown - en - material - digital 48 54_hospital_unknown_en_material
55 ll - cause - delete - unknown - 48 55_ll_cause_delete_unknown
56 self - question - neutral - yes - statement 48 56_self_question_neutral_yes
57 fat - loose - small - sugar - common 47 57_fat_loose_small_sugar
58 true - - - - 47 58_true___
59 cream - drinks - seafood - fruit - ice cream 46 59_cream_drinks_seafood_fruit
60 tr - ru - pers - pt - prod 46 60_tr_ru_pers_pt
61 - - - - 45 61____
62 clothing - care - kitchen - personal - health 44 62_clothing_care_kitchen_personal
63 business - news - tech - entertainment - sport 43 63_business_news_tech_entertainment
64 non - partial - neutral - yes - ok 43 64_non_partial_neutral_yes
65 organization person - location organization - organization - location - person 43 65_organization person_location organization_organization_location
66 daisy - tulip - rose - - 43 66_daisy_tulip_rose_
67 joy - sadness - anger - angry - happy 42 67_joy_sadness_anger_angry
68 samoyed - corgi - husky - pomeranian - golden 41 68_samoyed_corgi_husky_pomeranian
69 music - instrument - engine - wind - animals 41 69_music_instrument_engine_wind
70 hate - language - reporting - non - normal 41 70_hate_language_reporting_non
71 label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 41 71_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24
72 id - - - - 40 72_id___
73 animals - tech - dance - tiger - sport 40 73_animals_tech_dance_tiger
74 org org - loc loc - org - misc - loc 40 74_org org_loc loc_org_misc
75 star - positive - negative - negative positive - 38 75_star_positive_negative_negative positive
76 bird - ship - frog - horse - truck 37 76_bird_ship_frog_horse
77 cat - cats - dog - dogs - sleeping 37 77_cat_cats_dog_dogs
78 family - sports - music - related - health 37 78_family_sports_music_related
79 label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 37 79_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9
80 room - service - transport - care - kitchen 37 80_room_service_transport_care
81 positive - negative - neutral positive - neutral - positive negative 37 81_positive_negative_neutral positive_neutral
82 test - play - train - non - live 36 82_test_play_train_non
83 tim - evt - pro - gpe - org 36 83_tim_evt_pro_gpe
84 cold - disease - pressure - drug - blood 36 84_cold_disease_pressure_drug
85 non - early - late - - 35 85_non_early_late_
86 21 - office - 20 - 17 - 16 34 86_21_office_20_17
87 prep - nn - cc - pro - ex 34 87_prep_nn_cc_pro
88 evidence - position - statement - lead - request 33 88_evidence_position_statement_lead
89 adp - aux - sconj - cconj - det noun 33 89_adp_aux_sconj_cconj
90 job - start - help - address - quantity 33 90_job_start_help_address
91 gender - number - case - ind - person 33 91_gender_number_case_ind
92 threat - hate - adult - target - male 33 92_threat_hate_adult_target
93 institution - tools - organization - org - agent 32 93_institution_tools_organization_org
94 - - - - 32 94____
95 email - age - patient - state - zip 32 95_email_age_patient_state
96 mixed - positive - negative - neutral - neutral positive 32 96_mixed_positive_negative_neutral
97 test - help - joke - contact - report 32 97_test_help_joke_contact
98 address - balance - statement - request - second 31 98_address_balance_statement_request
99 - - - - 31 99____
100 hate - non - neutral - - 30 100_hate_non_neutral_
101 - - - - 30 101____
102 unk - zero - seven - 10 - blank 30 102_unk_zero_seven_10
103 male - female - young - adult - skin 30 103_male_female_young_adult
104 94 - 59 60 - 49 50 - 81 - 97 29 104_94_59 60_49 50_81
105 normal - cell - large - clean - lower 29 105_normal_cell_large_clean
106 lincoln - jaguar - audio - source - general 28 106_lincoln_jaguar_audio_source
107 title - section - header - list - item 28 107_title_section_header_list
108 - - - - 28 108____
109 yes - - - - 27 109_yes___
110 - - - - 26 110____
111 contradiction - entailment - neutral - non - 26 111_contradiction_entailment_neutral_non
112 instrument - org org - org org org - term - org 26 112_instrument_org org_org org org_term
113 ft - cardinal - act - loc - loc loc 25 113_ft_cardinal_act_loc
114 event - pro - pers - loc org - prod 25 114_event_pro_pers_loc org
115 ben - ext - exp - root - loc 25 115_ben_ext_exp_root
116 - - - - 25 116____
117 low - - - - 25 117_low___
118 ft - cardinal - act - loc - loc misc org 25 118_ft_cardinal_act_loc
119 statement - question - evidence - experience - answer 25 119_statement_question_evidence_experience
120 label_122 - label_121 - label_120 - label_123 - label_119 24 120_label_122_label_121_label_120_label_123
121 clean - - - - 24 121_clean___
122 ru - tr - el - en - hi 24 122_ru_tr_el_en
123 disgust - sadness surprise - joy love - surprise - joy 24 123_disgust_sadness surprise_joy love_surprise
124 statement - info - check - news - non 24 124_statement_info_check_news
125 motor - start - help - housing - yes 24 125_motor_start_help_housing
126 greek - chinese - italian - japanese - dutch 24 126_greek_chinese_italian_japanese
127 anger disgust - fear - disgust - sadness - anger 23 127_anger disgust_fear_disgust_sadness
128 date event - percent person - quantity - money - percent 23 128_date event_percent person_quantity_money
129 label_95 label_96 label_97 - label_97 label_98 label_99 - label_97 label_98 - label_94 label_95 label_96 - label_94 label_95 23 129_label_95 label_96 label_97_label_97 label_98 label_99_label_97 label_98_label_94 label_95 label_96
130 period - question - noun - number - 23 130_period_question_noun_number
131 neutral - - - - 22 131_neutral___
132 local - la - pad - data - personal 22 132_local_la_pad_data
133 partial - - - - 22 133_partial___
134 human - art - machine - - 22 134_human_art_machine_
135 fear joy - sadness surprise - surprise - disgust fear - joy 21 135_fear joy_sadness surprise_surprise_disgust fear
136 location organization - organization person - organization - price - disease 21 136_location organization_organization person_organization_price
137 14 15 16 - 12 13 14 - 13 14 15 - 11 12 13 - 10 11 12 21 137_14 15 16_12 13 14_13 14 15_11 12 13
138 sports - tech - business - sport - 21 138_sports_tech_business_sport
139 disorder - body - patient - age - disease 20 139_disorder_body_patient_age
140 sad - dis - sur - joy - 20 140_sad_dis_sur_joy
141 healthy - - - - 20 141_healthy___
142 drink - tea - wine - coffee - soft 20 142_drink_tea_wine_coffee
143 protein - chemical - cell - - 20 143_protein_chemical_cell_
144 rna - - - - 20 144_rna___
145 normal - covid - - - 20 145_normal_covid__
146 ex - pt - - - 20 146_ex_pt__
147 ok - ft - year - int - rel 20 147_ok_ft_year_int
148 header - currency - item - zip - state 20 148_header_currency_item_zip
149 label_122 label_123 - label_123 - label_122 - label_121 - label_120 19 149_label_122 label_123_label_123_label_122_label_121
150 anger disgust - anger disgust fear - disgust fear - disgust - sadness surprise 19 150_anger disgust_anger disgust fear_disgust fear_disgust
151 na - nn - ft - dis - bio 19 151_na_nn_ft_dis
152 angry - happy - sad - happy neutral - neutral 19 152_angry_happy_sad_happy neutral
153 organization percent person - organization percent - miscellaneous - percent person - percent 19 153_organization percent person_organization percent_miscellaneous_percent person
154 paper - metal - glass - tray - ticket 19 154_paper_metal_glass_tray
155 mask - normal - sharp - head - green 19 155_mask_normal_sharp_head
156 noun num pron - num pron propn - pron propn punct - num pron - adj adp adv 18 156_noun num pron_num pron propn_pron propn punct_num pron
157 answer - - - - 18 157_answer___
158 review - id - job - email - state 18 158_review_id_job_email
159 seven - queen - jack - king - war 18 159_seven_queen_jack_king
160 neg - nan - good - - 18 160_neg_nan_good_
161 ii - blank - vi - et - lower 18 161_ii_blank_vi_et
162 golden - husky - samoyed - pug - german 17 162_golden_husky_samoyed_pug
163 arg - delete - act - neg - lead 17 163_arg_delete_act_neg
164 exp - pp - intj - punc - prep 17 164_exp_pp_intj_punc
165 email - form - letter - report - news 17 165_email_form_letter_report
166 protein - rna - cell - line - type 17 166_protein_rna_cell_line
167 en - hi - fur - - 17 167_en_hi_fur_
168 - - - - 17 168____
169 - - - - 17 169____
170 loc loc - loc - pers - evt - 16 170_loc loc_loc_pers_evt
171 menu - - - - 16 171_menu___
172 normal - - - - 16 172_normal___
173 label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 16 173_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98
174 cell - organ - organism - tissue - disease 16 174_cell_organ_organism_tissue
175 target - instrument - opinion - price - product 16 175_target_instrument_opinion_price
176 org org - org org org - loc loc - org - prs 16 176_org org_org org org_loc loc_org
177 10 11 - 10 11 12 - 11 12 - 12 - 11 16 177_10 11_10 11 12_11 12_12
178 korean - russian - dutch - persian - french 16 178_korean_russian_dutch_persian
179 label_4 label_40 label_41 - label_39 label_4 label_40 - label_38 label_39 label_4 - label_37 label_38 label_39 - label_40 label_41 16 179_label_4 label_40 label_41_label_39 label_4 label_40_label_38 label_39 label_4_label_37 label_38 label_39
180 experience - location - loc misc org - loc misc - misc org 15 180_experience_location_loc misc org_loc misc
181 normal - pressure - high - water - 15 181_normal_pressure_high_water
182 company - institution - loc org - degree - org 15 182_company_institution_loc org_degree
183 short - sl - long - - 15 183_short_sl_long_
184 good - bad - non - - 15 184_good_bad_non_
185 149 - 151 - 191 - 199 - 231 15 185_149_151_191_199
186 unknown - vi - ii - - 15 186_unknown_vi_ii_
187 end - head - cross - - 15 187_end_head_cross_
188 forest - street - road - tree - mountain 15 188_forest_street_road_tree
189 label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 14 189_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10
190 prod - loc - evt - org org - loc loc 14 190_prod_loc_evt_org org
191 tech - business - sports - science - female 14 191_tech_business_sports_science
192 adult - child - young - - 14 192_adult_child_young_
193 human - organism - plants - - 14 193_human_organism_plants_
194 hot dog - chicken - hot - food - dog 14 194_hot dog_chicken_hot_food
195 rain - snow - - - 14 195_rain_snow__
196 objective - neutral - - - 14 196_objective_neutral__
197 pro - neutral - russian - attack - 14 197_pro_neutral_russian_attack
198 normal - disorder - good - - 14 198_normal_disorder_good_
199 road - good - bike - - 14 199_road_good_bike_
200 - - - - 14 200____
201 science - energy - arts - nuclear - systems 13 201_science_energy_arts_nuclear
202 - - - - 13 202____
203 event - ticket - ok - loose - non 13 203_event_ticket_ok_loose
204 neutral - left - right - unknown - 13 204_neutral_left_right_unknown
205 - - - - 13 205____
206 crime - pers - time - book - org 13 206_crime_pers_time_book
207 seven - start - record - zero - open 13 207_seven_start_record_zero
208 label_5 label_50 label_51 - label_50 label_51 label_52 - label_51 label_52 label_53 - label_51 label_52 - label_50 label_51 13 208_label_5 label_50 label_51_label_50 label_51 label_52_label_51 label_52 label_53_label_51 label_52
209 label_29 label_3 label_30 - label_26 label_27 label_28 - label_27 label_28 label_29 - label_27 label_28 - label_28 label_29 label_3 13 209_label_29 label_3 label_30_label_26 label_27 label_28_label_27 label_28 label_29_label_27 label_28
210 human - machine - - - 13 210_human_machine__
211 control - la - sin - social - ambient 13 211_control_la_sin_social
212 anger fear - sadness - anger - fear - fear joy 13 212_anger fear_sadness_anger_fear
213 panda - ticket - air - bamboo - el 13 213_panda_ticket_air_bamboo
214 target - - - - 13 214_target___
215 id - container - type - person - number 12 215_id_container_type_person
216 neutral - positive - negative - neutral positive - positive negative 12 216_neutral_positive_negative_neutral positive
217 change - bad - movement - work - science 12 217_change_bad_movement_work
218 rust - - - - 12 218_rust___
219 quantity - container - package - id - weight 12 219_quantity_container_package_id
220 text - - - - 12 220_text___
221 background - objective - - - 12 221_background_objective__
222 middle - subject - yes - request - answer 12 222_middle_subject_yes_request
223 - - - - 12 223____
224 public - ambiguous - non - person - 12 224_public_ambiguous_non_person
225 healthy - plant - pepper - spot - leaf 12 225_healthy_plant_pepper_spot
226 punc - prep - digit - latin - conj 12 226_punc_prep_digit_latin
227 location money - language - percent person - actor - money 12 227_location money_language_percent person_actor
228 - - - - 11 228____
229 punc - zero - pers - neg - reflex 11 229_punc_zero_pers_neg
230 album - major - copper - coon - common 11 230_album_major_copper_coon
231 metal - pop - country - dance - hip 11 231_metal_pop_country_dance
232 energy - common - grass - persian - removal 11 232_energy_common_grass_persian
233 man - double - bird - long - single 11 233_man_double_bird_long
234 17 - 16 - 18 - 13 - 15 11 234_17_16_18_13
235 email - actor - threat - tools - attack 11 235_email_actor_threat_tools
236 space - - - - 11 236_space___
237 type - country - jeep - van - lincoln 11 237_type_country_jeep_van
238 general - - - - 10 238_general___
239 ru - mat - - - 10 239_ru_mat__
240 contradiction - non - entailment - neutral - 10 240_contradiction_non_entailment_neutral
241 city - new - country - location - label_1 10 241_city_new_country_location
242 non - legal - sub - - 9 242_non_legal_sub_
243 tulip - cattle - motorcycle - road - color 8 243_tulip_cattle_motorcycle_road
244 item - color - cc - model - 8 244_item_color_cc_model
245 delivery - product - service - different - environment 7 245_delivery_product_service_different
246 degree - tim - neg - pos - propn 6 246_degree_tim_neg_pos
247 threat - hate - non - unknown - neutral 6 247_threat_hate_non_unknown
248 label_33 label_34 - label_32 label_33 label_34 - label_32 label_33 - label_31 label_32 label_33 - label_31 label_32 6 248_label_33 label_34_label_32 label_33 label_34_label_32 label_33_label_31 label_32 label_33
249 experience - location - - - 6 249_experience_location__
250 nat - gpe - geo - pro - tim 5 250_nat_gpe_geo_pro

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11