--- base_model: manuel-couto-pintos/roberta_erisk datasets: [] language: [] library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:30288 - loss:MultipleNegativesRankingLoss widget: - source_sentence: 'Looks like a small cockroach, but much more colorful, 0.75" long. [Atlanta, Georgia] ' sentences: - 'Help me win a bet: What size gi does Marcelo Garcia wear? I suspect he uses different size pants relative to the gi-top because of his epic thighs relative to stature. My buddy just says A2 all around (on average, recognizing that it varies by brand). What do you say? ' - 'What little things about the Star Wars Universe do you love? ' - 'Looks like a small cockroach, but much more colorful, 0.75" long. [Atlanta, Georgia] ' - source_sentence: "Clogged Construction on my brand new condo finished this summer.\ \ Not wasting a second, I broke lease on my musky apartment, and moved in as soon\ \ as possible. I rather enjoyed knowing I was the first resident living here:\ \ there was no wear and tear, no smoke stains on the walls, and no damage to the\ \ structure. The only issue was a light clattering sound whenever I used the commercial\ \ sink in my laundry room. I rarely used it, so I didn't bring up the problem\ \ to the contractors. Everything else worked perfectly, and my home was as sterile\ \ as an operating table.\n\n\n\n nbsp;\n\n\n\nAfter a few months, I began noticing\ \ water pooling at the foot of my shower. The drain must have been clogged. I\ \ took to my tools, unscrewed the shower drain, and peered inside. I could see\ \ a collection of fibers bunched up in the pipes. Reaching in with an unfolded\ \ coat hanger, I pulled out mountains of dirty blond hair clogging the pipes.\ \ I live alone, I don't have any pets, I haven't entertained a lady in over a\ \ year, and I've been bald since I was 27.\n\n\n\n nbsp;\n\n\n\nThe odd phenomena\ \ got me thinking about the sink in the laundry room. I detached the aerator,\ \ placed my hand under the faucet, and turned on the water. Dozens of molars came\ \ flying out, slipping through my fingers and into the sink, bouncing up and down\ \ until ultimately falling down the drain.\n\n\n\n nbsp;\n\n\n\nOn a completely\ \ unrelated note: I have a beautiful, fully furnished, barely-used condo for sale.\ \ Located in downtown Detroit. Anyone interested? " sentences: - '3-2 defense cannot stop corner 3s? Does anyone else have this problem? My down low guys won''t kick out to even try to defend an open 3 shot, and the computer just spams this on me all day when I play offline. ' - tw.being suicidal but knowing someone whos commit is the worst thing in the world. bc you see both sides. you see how it affects the people that love that person. including yourself. you see how it doesnt end the pain but it just passes it on to all the people who are left to deal with it. but then it also makes it so much more understandable as to why someone did it. you know what its like to want the pain to end. the feeling of your brain sabotaging you and your happiness constantly. to stop feeling like youre drowning in yourself. you get each and every point to it. and in a sense it makes me feel even more guilty for ever having the thought in the first place. for it becoming my safe space. knowing that if things dont fall into place that im okay with not being here anymore but not being okay leaving the people you love to clean up the mess / carry it with them for the rest of their lives. sorry. end rant. - "Clogged Construction on my brand new condo finished this summer. Not wasting\ \ a second, I broke lease on my musky apartment, and moved in as soon as possible.\ \ I rather enjoyed knowing I was the first resident living here: there was no\ \ wear and tear, no smoke stains on the walls, and no damage to the structure.\ \ The only issue was a light clattering sound whenever I used the commercial sink\ \ in my laundry room. I rarely used it, so I didn't bring up the problem to the\ \ contractors. Everything else worked perfectly, and my home was as sterile as\ \ an operating table.\n\n\n\n nbsp;\n\n\n\nAfter a few months, I began noticing\ \ water pooling at the foot of my shower. The drain must have been clogged. I\ \ took to my tools, unscrewed the shower drain, and peered inside. I could see\ \ a collection of fibers bunched up in the pipes. Reaching in with an unfolded\ \ coat hanger, I pulled out mountains of dirty blond hair clogging the pipes.\ \ I live alone, I don't have any pets, I haven't entertained a lady in over a\ \ year, and I've been bald since I was 27.\n\n\n\n nbsp;\n\n\n\nThe odd phenomena\ \ got me thinking about the sink in the laundry room. I detached the aerator,\ \ placed my hand under the faucet, and turned on the water. Dozens of molars came\ \ flying out, slipping through my fingers and into the sink, bouncing up and down\ \ until ultimately falling down the drain.\n\n\n\n nbsp;\n\n\n\nOn a completely\ \ unrelated note: I have a beautiful, fully furnished, barely-used condo for sale.\ \ Located in downtown Detroit. Anyone interested? " - source_sentence: 'Top 10 Movies Trailers of 2017 Must watch It ' sentences: - Im on coke n 2 mg kpin and im anxious as fuckIdk what i can do to get rid of this i know coke doesnt last long but the anxietys lingering n the kpins are keeping me borderline okay, but I've never been this anxious on coke i feel like im on a psychedelic having a bad trip but im not tripping its just the anxiety. Can anyone help me thru this - '[Giveaway] 10 BTS for new users ' - 'Top 10 Movies Trailers of 2017 Must watch It ' - source_sentence: 'Vet says he nearly operated on himself when VA wouldn''t pay medical bill. ' sentences: - 'What kind of soap is best to get glitter off your skin? ' - 'Alvvays is nearly done tracking their next album ' - 'Vet says he nearly operated on himself when VA wouldn''t pay medical bill. ' - source_sentence: Age old questions[View Poll](https://www.reddit.com/poll/m89hf3) sentences: - "GUYS I MIGHT HAVE TO DELETE THIS ACCOUNT BECAUSE MY BF KNOWS MY ACC BUT I DON'T\ \ WANT TO IT'S A MASSIVE URGENCE I'VE HAD THIS 3 YEARS So basically me and my\ \ boyfriend was messing around but he decided to go onto my reddit app and he\ \ \"accidently\" saw my reddit account name and he said that he's not going to\ \ look cause he knows he won't like what he sees but GUYS my post history is fucked\ \ i'm fucked it makes me look more fucked then I am what the fuck do i dooooo\ \ D:\n\nI don't wanna start over and there's a couple of subreddits that are suscriber\ \ only so how the fuck am i gonna get back \n\nhe's said he's been curious about\ \ this before but he knows the sorta stuff i post and he said it would really\ \ upset him but when he's curios he usally won't stop wondering but I like to\ \ think that i can trust him but I''m complety FUCKED. \n\napparently he forgot\ \ it too but he has good memory " - Age old questions[View Poll](https://www.reddit.com/poll/m89hf3) - 'Who else is in a opposite gender dominated industry? What have been your experiences? I am a female in IT. I chose this field because I enjoy it, and it turns out I am good at it. I am not concerned about the gender bias because I feel my qualifications and experience speak for themselves, and so far that has been the case (the only time I have been discriminated against it has not affected my career progress). However, I''m relatively inexperienced and I would love to know other people''s experiences in similar environments. ' --- # SentenceTransformer based on manuel-couto-pintos/roberta_erisk This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [manuel-couto-pintos/roberta_erisk](https://huggingface.co/manuel-couto-pintos/roberta_erisk). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [manuel-couto-pintos/roberta_erisk](https://huggingface.co/manuel-couto-pintos/roberta_erisk) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("manuel-couto-pintos/roberta_erisk_simcse") # Run inference sentences = [ 'Age old questions[View Poll](https://www.reddit.com/poll/m89hf3)', 'Age old questions[View Poll](https://www.reddit.com/poll/m89hf3)', "Who else is in a opposite gender dominated industry? What have been your experiences? I am a female in IT. I chose this field because I enjoy it, and it turns out I am good at it. I am not concerned about the gender bias because I feel my qualifications and experience speak for themselves, and so far that has been the case (the only time I have been discriminated against it has not affected my career progress). However, I'm relatively inexperienced and I would love to know other people's experiences in similar environments. ", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 30,288 training samples * Columns: sentence_0 and sentence_1 * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | sentence_0 | sentence_1 | |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Actor Cory Monteith, Who Played Finn Hudson On 'Glee,' Found Dead | Actor Cory Monteith, Who Played Finn Hudson On 'Glee,' Found Dead | | Is the AW3420DW worth double the cost of a $500 monitor?I've been researching ultrawides and wanted to know people's opinion if the extra cost for the [Alienware AW3420DW ($999)](https://www.microcenter.com/product/620684/dell-alienware-aw3420dw-34-wqhd-120hz-hdmi-dp-g-sync--curved-ips-led-gaming-monitor) was worth the extra over say a [AOC CU34G2X ($449)](https://www.microcenter.com/product/618536/aoc-cu34g2x-34-qhd-144hz-hdmi-dp-freesync-ultrawide-curved-led-gaming-monitor) or [BenQ EX3501R ($649)](https://www.bhphotovideo.com/c/product/1383775-REG/benq_ex3501r_premium_grey_35_va_3440x1440.html) or another monitor in that range? If I'm willing to spend the cash for the Alienware, should I just make the leap? | Is the AW3420DW worth double the cost of a $500 monitor?I've been researching ultrawides and wanted to know people's opinion if the extra cost for the [Alienware AW3420DW ($999)](https://www.microcenter.com/product/620684/dell-alienware-aw3420dw-34-wqhd-120hz-hdmi-dp-g-sync--curved-ips-led-gaming-monitor) was worth the extra over say a [AOC CU34G2X ($449)](https://www.microcenter.com/product/618536/aoc-cu34g2x-34-qhd-144hz-hdmi-dp-freesync-ultrawide-curved-led-gaming-monitor) or [BenQ EX3501R ($649)](https://www.bhphotovideo.com/c/product/1383775-REG/benq_ex3501r_premium_grey_35_va_3440x1440.html) or another monitor in that range? If I'm willing to spend the cash for the Alienware, should I just make the leap? | | My first time making it to a week! Awesome! Nothing to say, just felt like sharing(: Have a good day!



**EDIT:** Oh my gosh, I meant to say month... Woops.
| My first time making it to a week! Awesome! Nothing to say, just felt like sharing(: Have a good day!



**EDIT:** Oh my gosh, I meant to say month... Woops.
| * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 10 - `per_device_eval_batch_size`: 10 - `num_train_epochs`: 1 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 10 - `per_device_eval_batch_size`: 10 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `eval_use_gather_object`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1651 | 500 | 0.8614 | | 0.3301 | 1000 | 0.0012 | | 0.4952 | 1500 | 0.0007 | | 0.6603 | 2000 | 0.0002 | | 0.8254 | 2500 | 0.0002 | | 0.9904 | 3000 | 0.0 | ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.1 - Transformers: 4.44.2 - PyTorch: 2.0.1+cu117 - Accelerate: 0.32.0 - Datasets: 2.20.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```