LukeOLuck commited on
Commit
083997e
·
1 Parent(s): b598f90

init commit

Browse files
Files changed (6) hide show
  1. all-MiniLM-L6-v2.pth +3 -0
  2. answers_texts.txt +5 -0
  3. app.py +93 -0
  4. model.py +41 -0
  5. questions_texts.txt +5 -0
  6. requirements.txt +4 -0
all-MiniLM-L6-v2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f89bf803b7cb61d4e1d39a94ed8c88bdb7a6f6392403abd8cc726d1fc509d05
3
+ size 90895911
answers_texts.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ Ok , let say that the doors are marked A , B and C. Suppose you first pick door number A. Now lets look at the possibilities . i ) The car is behind door A. Then switching the door means you loose , but staying on A means you win . ii ) The car is behind door B. Then switching the door means you win , but staying on A means you loose . iii ) The car is behind door C. Then switching the door means you win , but staying on A means you loose . In 2 out of 3 cases , switching the door mean you win , but in only 1 out of 3 cases you 'll win if you stay . That 's the reason you have an advantage by switching doors .
2
+ [ I can . ] ( URL_0 ) I developed a [ strabismus ] ( URL_1 ) ( squint , turn ) at a young age and had surgery to correct it which was only partially successful . My right eye is much more [ myopic ] ( URL_2 ) than my left , so my brain tends to ignore that eye . The muscles holding my right eye in alignment relax , and the eye drifts out . This messes with my stereoscopic vision , and I see double . When I become aware of seeing double , I can pull my eyes back into alignment . In the video , all I 'm doing is relaxing and straining those muscles . ' One eye out ' is actually the default position , when the muscles are relaxed . Then I strain extra hard to bring it further in than it should be . I wear glasses / contacts which correct the myopia in my bad eye , and it happens involuntarily less often , but it becomes extremely hard to control when I 'm tired or drunk . I 've never seen a 3D movie that worked for me . Something to do with the eye drifting when I 'm focussed on a point for too long- i.e. the screen . However , I can make ' Magic Eye ' type stereograms work straight away without blinking .
3
+ I 've heard of two cases where they have . One was a [ dolphin ] ( URL_0 ) , which stopped breathing intentionally . The other was a bear that was having its bile drained for some eastern medicine thing . Not just once , that was basically the bear 's life , being a perpetually wounded source . One day , she killed her cub , then slammed her head into the bars of her cage until she died . She 's not the only one , and that 's not the only [ method of choice ] ( URL_1 ) ... ( Note that I 'm not counting certain defense mechanisms , like exploding ants or some bees that die when they sting , as suicide per se . )
4
+ > For example , why ca n't Microsoft Word open a .odt file ? There is zero financial incentive for Microsoft to support competing formats . There is considerable financial incentive for them to try to lock customers up in proprietary Microsoft formats , however . > What happens when a file gets corrupted / unstable ? Depending on how the file was corrupted , it can sometimes be repaired or at least partially salvaged . In some cases , even minor corruption can render the file utterly unreadable ( encrypted file containers , for example ) . > Why not have just one format for all videos and one format for all pictures ? Well , that sure would be nice , would n't it ? Trouble is , there is no single standards body in charge of these sorts of things , and anyone is free to invent their own proprietary format ( which is exactly what has led us to the dizzying array of competing formats that we know and love today ) .
5
+ They hire an experts in the country they 're trying to do business in . By selling merchandise in France they agree to follow all the rules and regulations set down by that country . To even import their products they need a legal and accounting team to figure out the laws and fees associated with doing so . Generally most companies make a subsidiary in the country they are doing business in so something like " Apple France " just worries about the French specifics of doing business there .
app.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ import torch
4
+ import pickle
5
+ import gzip
6
+
7
+ from torch.nn.functional import cosine_similarity
8
+ from model import create_semantic_ranking_model
9
+ from timeit import default_timer as timer
10
+ from typing import Tuple, Dict
11
+
12
+ ### Load example texts ###
13
+ questions_texts = []
14
+ with open("questions_texts.txt", "r") as file:
15
+ questions_texts = [line.strip() for line in file.readlines()]
16
+
17
+ answers_texts = []
18
+ with open("answers_texts.txt", "r") as file:
19
+ answers_texts = [line.strip() for line in file.readlines()]
20
+
21
+ ### Model and transforms preparation ###
22
+ # Create model and tokenizer
23
+ model, tokenizer = create_semantic_ranking_model()
24
+
25
+ # Load saved weights
26
+ model.load_state_dict(
27
+ torch.load(f="all-MiniLM-L6-v2.pth",
28
+ map_location=torch.device("cpu")) # load to CPU
29
+ )
30
+
31
+ # Load the embeddings
32
+ with gzip.open('response_embeddings.pkl.gz', 'rb') as f:
33
+ response_embeddings = pickle.load(f)
34
+
35
+ # Load the response list
36
+ with gzip.open('response_list.pkl.gz', 'rb') as f:
37
+ response_list = pickle.load(f)
38
+
39
+ ### Predict function ###
40
+ def predict(text) -> Tuple[Dict, float]:
41
+ # Start a timer
42
+ start_time = timer()
43
+
44
+ # Set the model to eval
45
+ model.eval()
46
+
47
+ # Set up the inputs
48
+ tokenized_inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
49
+
50
+ # Get input_embeddings
51
+ with torch.inference_mode():
52
+ input_embeddings = model(**tokenized_inputs)
53
+
54
+ # Compute similarity scores
55
+ similarity_scores = cosine_similarity(input_embeddings.unsqueeze(1), response_embeddings.unsqueeze(0), dim=2)
56
+ top_responses_indices = torch.topk(similarity_scores, k=5, dim=1).indices.squeeze()
57
+
58
+ # Retrieve the actual response texts
59
+ top_responses = [response_list[idx] for idx in top_responses_indices]
60
+
61
+ # Get actual response
62
+ actual_response = None
63
+ for question, answer in zip(questions_texts, answers_texts):
64
+ if text.strip() == question.strip():
65
+ actual_response = answer
66
+ break
67
+
68
+ # Calculate pred time
69
+ end_time = timer()
70
+ pred_time = round(end_time - start_time, 4)
71
+
72
+ # Return pred dict and pred time
73
+ return {"Top Responses": top_responses, "Actual Response": actual_response}, pred_time
74
+
75
+ ### 4. Gradio app ###
76
+ # Create title, description and article
77
+ title = "Semantic Ranking with MiniLM-L6-v2"
78
+ description = "[A MiniLM-L6-H384-uncased MiniLM based model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) sentence embedding model trained to rank results from [HuggingFace 🤗 Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3). [Source Code Found Here](https://colab.research.google.com/drive/1o5a9zH1TxzaxLKV5AFUhZE8L8yMnO9Jw?usp=sharing)"
79
+ article = "Built with [Gradio](https://github.com/gradio-app/gradio) and [PyTorch](https://pytorch.org/). [Source Code Found Here](https://colab.research.google.com/drive/1o5a9zH1TxzaxLKV5AFUhZE8L8yMnO9Jw?usp=sharing)"
80
+
81
+ # Create the Gradio demo
82
+ demo = gr.Interface(fn=predict,
83
+ inputs=gr.Textbox(lines=2, placeholder="Type your text here..."),
84
+ outputs=[gr.JSON(label="Top Responses"),
85
+ gr.Textbox(label="Actual Response", disabled=True),
86
+ gr.Number(label="Prediction time (s)")],
87
+ examples=example_texts,
88
+ title=title,
89
+ description=description,
90
+ article=article)
91
+
92
+ # Launch the demo
93
+ demo.launch()
model.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from torch import nn
3
+ from transformers import AutoTokenizer, AutoModel
4
+
5
+ class AutoModelForSentenceEmbedding(nn.Module):
6
+ def __init__(self, model):
7
+ super().__init__()
8
+
9
+ self.model = model
10
+
11
+ def forward(self, **kwargs)
12
+ model_output = self.model(**kwargs)
13
+ embeddings = self.mean_pooling(model_output, kwargs['attention_mask'])
14
+ embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
15
+ return embeddings
16
+
17
+ def mean_pooling(self, model_output, attention_mask):
18
+ token_embeddings = model_output[0] # First element of model_output contains all token embeddings
19
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
20
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
21
+
22
+ device = "cuda" if torch.cuda.is_available() else "cpu"
23
+
24
+ def create_semantic_ranking_model(device=device):
25
+ """Creates a HuggingFace all-MiniLM-L6-v2 model.
26
+
27
+ Args:
28
+ device: A torch.device
29
+ Returns:
30
+ A tuple of the model and tokenizer
31
+ """
32
+ tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
33
+ model = AutoModelForSentenceEmbedding(AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')).to(device)
34
+
35
+ for param in model.model.parameters():
36
+ param.requires_grad = False
37
+
38
+ return model, tokenizer
39
+
40
+ # Example usage
41
+ model, tokenizer = create_semantic_ranking_model()
questions_texts.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ Formatting on computers For example , why ca n't Microsoft Word open a .odt file ? What happens when a file gets corrupted / unstable ? Why not have just one format for all videos and one format for all pictures ? Explain like I'm five.
2
+ How do companies like apple keep track of laws in different countries ? in light of france suing apple due to a law in france that say that planned obsolescense is illegal , how can an international company keep track of every thing Explain like I'm five.
3
+ Can animals commit suicide ? i always wondered if they can , like can animal become so sad that 's he eventually kill himself ? Explain like I'm five.
4
+ the Monty Hall problem I think I get it , but then I ... don't . Why would it be advantageous to switch after one door is opened . Sure , it 's 50/50 instead of 1/3 , but it just does n't make sense to me . Explain like I'm five.
5
+ Why ca n't we move our eyes independently ? How come both of our eyes move in the same direction ? Is this muscle related or nerve related ? Also is there any benefit to this or is it simply necessary for us . Please explain like I'm five.
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ torch==2.1.0
2
+ torchvision==0.16.0
3
+ gradio==3.50.2
4
+ transformers==4.35.0