init commit
Browse files- all-MiniLM-L6-v2.pth +3 -0
- answers_texts.txt +5 -0
- app.py +93 -0
- model.py +41 -0
- questions_texts.txt +5 -0
- requirements.txt +4 -0
all-MiniLM-L6-v2.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5f89bf803b7cb61d4e1d39a94ed8c88bdb7a6f6392403abd8cc726d1fc509d05
|
3 |
+
size 90895911
|
answers_texts.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Ok , let say that the doors are marked A , B and C. Suppose you first pick door number A. Now lets look at the possibilities . i ) The car is behind door A. Then switching the door means you loose , but staying on A means you win . ii ) The car is behind door B. Then switching the door means you win , but staying on A means you loose . iii ) The car is behind door C. Then switching the door means you win , but staying on A means you loose . In 2 out of 3 cases , switching the door mean you win , but in only 1 out of 3 cases you 'll win if you stay . That 's the reason you have an advantage by switching doors .
|
2 |
+
[ I can . ] ( URL_0 ) I developed a [ strabismus ] ( URL_1 ) ( squint , turn ) at a young age and had surgery to correct it which was only partially successful . My right eye is much more [ myopic ] ( URL_2 ) than my left , so my brain tends to ignore that eye . The muscles holding my right eye in alignment relax , and the eye drifts out . This messes with my stereoscopic vision , and I see double . When I become aware of seeing double , I can pull my eyes back into alignment . In the video , all I 'm doing is relaxing and straining those muscles . ' One eye out ' is actually the default position , when the muscles are relaxed . Then I strain extra hard to bring it further in than it should be . I wear glasses / contacts which correct the myopia in my bad eye , and it happens involuntarily less often , but it becomes extremely hard to control when I 'm tired or drunk . I 've never seen a 3D movie that worked for me . Something to do with the eye drifting when I 'm focussed on a point for too long- i.e. the screen . However , I can make ' Magic Eye ' type stereograms work straight away without blinking .
|
3 |
+
I 've heard of two cases where they have . One was a [ dolphin ] ( URL_0 ) , which stopped breathing intentionally . The other was a bear that was having its bile drained for some eastern medicine thing . Not just once , that was basically the bear 's life , being a perpetually wounded source . One day , she killed her cub , then slammed her head into the bars of her cage until she died . She 's not the only one , and that 's not the only [ method of choice ] ( URL_1 ) ... ( Note that I 'm not counting certain defense mechanisms , like exploding ants or some bees that die when they sting , as suicide per se . )
|
4 |
+
> For example , why ca n't Microsoft Word open a .odt file ? There is zero financial incentive for Microsoft to support competing formats . There is considerable financial incentive for them to try to lock customers up in proprietary Microsoft formats , however . > What happens when a file gets corrupted / unstable ? Depending on how the file was corrupted , it can sometimes be repaired or at least partially salvaged . In some cases , even minor corruption can render the file utterly unreadable ( encrypted file containers , for example ) . > Why not have just one format for all videos and one format for all pictures ? Well , that sure would be nice , would n't it ? Trouble is , there is no single standards body in charge of these sorts of things , and anyone is free to invent their own proprietary format ( which is exactly what has led us to the dizzying array of competing formats that we know and love today ) .
|
5 |
+
They hire an experts in the country they 're trying to do business in . By selling merchandise in France they agree to follow all the rules and regulations set down by that country . To even import their products they need a legal and accounting team to figure out the laws and fees associated with doing so . Generally most companies make a subsidiary in the country they are doing business in so something like " Apple France " just worries about the French specifics of doing business there .
|
app.py
ADDED
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import os
|
3 |
+
import torch
|
4 |
+
import pickle
|
5 |
+
import gzip
|
6 |
+
|
7 |
+
from torch.nn.functional import cosine_similarity
|
8 |
+
from model import create_semantic_ranking_model
|
9 |
+
from timeit import default_timer as timer
|
10 |
+
from typing import Tuple, Dict
|
11 |
+
|
12 |
+
### Load example texts ###
|
13 |
+
questions_texts = []
|
14 |
+
with open("questions_texts.txt", "r") as file:
|
15 |
+
questions_texts = [line.strip() for line in file.readlines()]
|
16 |
+
|
17 |
+
answers_texts = []
|
18 |
+
with open("answers_texts.txt", "r") as file:
|
19 |
+
answers_texts = [line.strip() for line in file.readlines()]
|
20 |
+
|
21 |
+
### Model and transforms preparation ###
|
22 |
+
# Create model and tokenizer
|
23 |
+
model, tokenizer = create_semantic_ranking_model()
|
24 |
+
|
25 |
+
# Load saved weights
|
26 |
+
model.load_state_dict(
|
27 |
+
torch.load(f="all-MiniLM-L6-v2.pth",
|
28 |
+
map_location=torch.device("cpu")) # load to CPU
|
29 |
+
)
|
30 |
+
|
31 |
+
# Load the embeddings
|
32 |
+
with gzip.open('response_embeddings.pkl.gz', 'rb') as f:
|
33 |
+
response_embeddings = pickle.load(f)
|
34 |
+
|
35 |
+
# Load the response list
|
36 |
+
with gzip.open('response_list.pkl.gz', 'rb') as f:
|
37 |
+
response_list = pickle.load(f)
|
38 |
+
|
39 |
+
### Predict function ###
|
40 |
+
def predict(text) -> Tuple[Dict, float]:
|
41 |
+
# Start a timer
|
42 |
+
start_time = timer()
|
43 |
+
|
44 |
+
# Set the model to eval
|
45 |
+
model.eval()
|
46 |
+
|
47 |
+
# Set up the inputs
|
48 |
+
tokenized_inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
|
49 |
+
|
50 |
+
# Get input_embeddings
|
51 |
+
with torch.inference_mode():
|
52 |
+
input_embeddings = model(**tokenized_inputs)
|
53 |
+
|
54 |
+
# Compute similarity scores
|
55 |
+
similarity_scores = cosine_similarity(input_embeddings.unsqueeze(1), response_embeddings.unsqueeze(0), dim=2)
|
56 |
+
top_responses_indices = torch.topk(similarity_scores, k=5, dim=1).indices.squeeze()
|
57 |
+
|
58 |
+
# Retrieve the actual response texts
|
59 |
+
top_responses = [response_list[idx] for idx in top_responses_indices]
|
60 |
+
|
61 |
+
# Get actual response
|
62 |
+
actual_response = None
|
63 |
+
for question, answer in zip(questions_texts, answers_texts):
|
64 |
+
if text.strip() == question.strip():
|
65 |
+
actual_response = answer
|
66 |
+
break
|
67 |
+
|
68 |
+
# Calculate pred time
|
69 |
+
end_time = timer()
|
70 |
+
pred_time = round(end_time - start_time, 4)
|
71 |
+
|
72 |
+
# Return pred dict and pred time
|
73 |
+
return {"Top Responses": top_responses, "Actual Response": actual_response}, pred_time
|
74 |
+
|
75 |
+
### 4. Gradio app ###
|
76 |
+
# Create title, description and article
|
77 |
+
title = "Semantic Ranking with MiniLM-L6-v2"
|
78 |
+
description = "[A MiniLM-L6-H384-uncased MiniLM based model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) sentence embedding model trained to rank results from [HuggingFace 🤗 Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3). [Source Code Found Here](https://colab.research.google.com/drive/1o5a9zH1TxzaxLKV5AFUhZE8L8yMnO9Jw?usp=sharing)"
|
79 |
+
article = "Built with [Gradio](https://github.com/gradio-app/gradio) and [PyTorch](https://pytorch.org/). [Source Code Found Here](https://colab.research.google.com/drive/1o5a9zH1TxzaxLKV5AFUhZE8L8yMnO9Jw?usp=sharing)"
|
80 |
+
|
81 |
+
# Create the Gradio demo
|
82 |
+
demo = gr.Interface(fn=predict,
|
83 |
+
inputs=gr.Textbox(lines=2, placeholder="Type your text here..."),
|
84 |
+
outputs=[gr.JSON(label="Top Responses"),
|
85 |
+
gr.Textbox(label="Actual Response", disabled=True),
|
86 |
+
gr.Number(label="Prediction time (s)")],
|
87 |
+
examples=example_texts,
|
88 |
+
title=title,
|
89 |
+
description=description,
|
90 |
+
article=article)
|
91 |
+
|
92 |
+
# Launch the demo
|
93 |
+
demo.launch()
|
model.py
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import torch
|
2 |
+
from torch import nn
|
3 |
+
from transformers import AutoTokenizer, AutoModel
|
4 |
+
|
5 |
+
class AutoModelForSentenceEmbedding(nn.Module):
|
6 |
+
def __init__(self, model):
|
7 |
+
super().__init__()
|
8 |
+
|
9 |
+
self.model = model
|
10 |
+
|
11 |
+
def forward(self, **kwargs)
|
12 |
+
model_output = self.model(**kwargs)
|
13 |
+
embeddings = self.mean_pooling(model_output, kwargs['attention_mask'])
|
14 |
+
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
|
15 |
+
return embeddings
|
16 |
+
|
17 |
+
def mean_pooling(self, model_output, attention_mask):
|
18 |
+
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
|
19 |
+
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
20 |
+
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
21 |
+
|
22 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
23 |
+
|
24 |
+
def create_semantic_ranking_model(device=device):
|
25 |
+
"""Creates a HuggingFace all-MiniLM-L6-v2 model.
|
26 |
+
|
27 |
+
Args:
|
28 |
+
device: A torch.device
|
29 |
+
Returns:
|
30 |
+
A tuple of the model and tokenizer
|
31 |
+
"""
|
32 |
+
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
|
33 |
+
model = AutoModelForSentenceEmbedding(AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')).to(device)
|
34 |
+
|
35 |
+
for param in model.model.parameters():
|
36 |
+
param.requires_grad = False
|
37 |
+
|
38 |
+
return model, tokenizer
|
39 |
+
|
40 |
+
# Example usage
|
41 |
+
model, tokenizer = create_semantic_ranking_model()
|
questions_texts.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Formatting on computers For example , why ca n't Microsoft Word open a .odt file ? What happens when a file gets corrupted / unstable ? Why not have just one format for all videos and one format for all pictures ? Explain like I'm five.
|
2 |
+
How do companies like apple keep track of laws in different countries ? in light of france suing apple due to a law in france that say that planned obsolescense is illegal , how can an international company keep track of every thing Explain like I'm five.
|
3 |
+
Can animals commit suicide ? i always wondered if they can , like can animal become so sad that 's he eventually kill himself ? Explain like I'm five.
|
4 |
+
the Monty Hall problem I think I get it , but then I ... don't . Why would it be advantageous to switch after one door is opened . Sure , it 's 50/50 instead of 1/3 , but it just does n't make sense to me . Explain like I'm five.
|
5 |
+
Why ca n't we move our eyes independently ? How come both of our eyes move in the same direction ? Is this muscle related or nerve related ? Also is there any benefit to this or is it simply necessary for us . Please explain like I'm five.
|
requirements.txt
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
torch==2.1.0
|
2 |
+
torchvision==0.16.0
|
3 |
+
gradio==3.50.2
|
4 |
+
transformers==4.35.0
|