Made the code (to be copied) really look like code so, it was confusing for newbies like me, to find something i can copy paste
f415fcd
license: apache-2.0 | |
language: en | |
tags: | |
- sentence similarity | |
library_name: sentence-transformers | |
pipeline_tag: sentence-similarity | |
# Dataset Collection: | |
* The news dataset is collected from Kaggle[dataset](https://www.kaggle.com/competitions/fake-news/data) | |
* The dataset has news title ,news content and the label(the label shows the cosine similarity between news title and news content). | |
* Different strategies have been followed during the data gathering phase. | |
# sentence transformer is fine-tuned for semantic search and sentence similarity | |
* The model is fine-tuned on the dataset. | |
* This model can be used for semantic search,sentence similarity,recommendation system. | |
* This model can be used for the inference purpose as well. | |
# Data Fields: | |
**label**: cosine similarity between news title and news content | |
**news title**: The title of the news | |
**news content**:The content of the news | |
# Application: | |
* This model is useful for the semantic search,sentence similarity,recommendation system. | |
* You can fine-tune this model for your particular use cases. | |
# Model Implementation | |
# pip install -U sentence-transformers | |
``` | |
from sentence_transformers import SentenceTransformer, InputExample, losses | |
import pandas as pd | |
from sentence_transformers import SentenceTransformer, InputExample | |
from torch.utils.data import DataLoader | |
from sentence_transformers import SentenceTransformer, util | |
model_name="Sakil/sentence_similarity_semantic_search" | |
sentences = ['A man is eating food.', | |
'A man is eating a piece of bread.', | |
'The girl is carrying a baby.', | |
'A man is riding a horse.', | |
'A woman is playing violin.', | |
'Two men pushed carts through the woods.', | |
'A man is riding a white horse on an enclosed ground.', | |
'A monkey is playing drums.', | |
'Someone in a gorilla costume is playing a set of drums.' | |
] | |
#Encode all sentences | |
embeddings = model.encode(sentences) | |
#Compute cosine similarity between all pairs | |
cos_sim = util.cos_sim(embeddings, embeddings) | |
#Add all pairs to a list with their cosine similarity score | |
all_sentence_combinations = [] | |
for i in range(len(cos_sim)-1): | |
for j in range(i+1, len(cos_sim)): | |
all_sentence_combinations.append([cos_sim[i][j], i, j]) | |
#Sort list by the highest cosine similarity score | |
all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True) | |
print("Top-5 most similar pairs:") | |
for score, i, j in all_sentence_combinations[0:5]: | |
print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j])) | |
``` | |
# Github: [Sakil Ansari](https://github.com/Sakil786/sentence_similarity_semantic_search) |