Spaces:

LukeOLuck
/

MiniLM_HC3_Semantic_Ranking

Sleeping

App Files Files Community

LukeOLuck commited on Dec 18, 2023

Commit

083997e

1 Parent(s): b598f90

init commit

Browse files

Files changed (6) hide show

all-MiniLM-L6-v2.pth +3 -0
answers_texts.txt +5 -0
app.py +93 -0
model.py +41 -0
questions_texts.txt +5 -0
requirements.txt +4 -0

all-MiniLM-L6-v2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f89bf803b7cb61d4e1d39a94ed8c88bdb7a6f6392403abd8cc726d1fc509d05
+size 90895911

answers_texts.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+Ok , let say that the doors are marked A , B and C. Suppose you first pick door number A. Now lets look at the possibilities . i ) The car is behind door A. Then switching the door means you loose , but staying on A means you win . ii ) The car is behind door B. Then switching the door means you win , but staying on A means you loose . iii ) The car is behind door C. Then switching the door means you win , but staying on A means you loose . In 2 out of 3 cases , switching the door mean you win , but in only 1 out of 3 cases you 'll win if you stay . That 's the reason you have an advantage by switching doors .
+[ I can . ] ( URL_0 ) I developed a [ strabismus ] ( URL_1 ) ( squint , turn ) at a young age and had surgery to correct it which was only partially successful . My right eye is much more [ myopic ] ( URL_2 ) than my left , so my brain tends to ignore that eye . The muscles holding my right eye in alignment relax , and the eye drifts out . This messes with my stereoscopic vision , and I see double . When I become aware of seeing double , I can pull my eyes back into alignment . In the video , all I 'm doing is relaxing and straining those muscles . ' One eye out ' is actually the default position , when the muscles are relaxed . Then I strain extra hard to bring it further in than it should be . I wear glasses / contacts which correct the myopia in my bad eye , and it happens involuntarily less often , but it becomes extremely hard to control when I 'm tired or drunk . I 've never seen a 3D movie that worked for me . Something to do with the eye drifting when I 'm focussed on a point for too long- i.e. the screen . However , I can make ' Magic Eye ' type stereograms work straight away without blinking .
+I 've heard of two cases where they have . One was a [ dolphin ] ( URL_0 ) , which stopped breathing intentionally . The other was a bear that was having its bile drained for some eastern medicine thing . Not just once , that was basically the bear 's life , being a perpetually wounded source . One day , she killed her cub , then slammed her head into the bars of her cage until she died . She 's not the only one , and that 's not the only [ method of choice ] ( URL_1 ) ... ( Note that I 'm not counting certain defense mechanisms , like exploding ants or some bees that die when they sting , as suicide per se . )
+> For example , why ca n't Microsoft Word open a .odt file ? There is zero financial incentive for Microsoft to support competing formats . There is considerable financial incentive for them to try to lock customers up in proprietary Microsoft formats , however . > What happens when a file gets corrupted / unstable ? Depending on how the file was corrupted , it can sometimes be repaired or at least partially salvaged . In some cases , even minor corruption can render the file utterly unreadable ( encrypted file containers , for example ) . > Why not have just one format for all videos and one format for all pictures ? Well , that sure would be nice , would n't it ? Trouble is , there is no single standards body in charge of these sorts of things , and anyone is free to invent their own proprietary format ( which is exactly what has led us to the dizzying array of competing formats that we know and love today ) .
+They hire an experts in the country they 're trying to do business in . By selling merchandise in France they agree to follow all the rules and regulations set down by that country . To even import their products they need a legal and accounting team to figure out the laws and fees associated with doing so . Generally most companies make a subsidiary in the country they are doing business in so something like " Apple France " just worries about the French specifics of doing business there .

app.py ADDED Viewed

	@@ -0,0 +1,93 @@

+import gradio as gr
+import os
+import torch
+import pickle
+import gzip
+from torch.nn.functional import cosine_similarity
+from model import create_semantic_ranking_model
+from timeit import default_timer as timer
+from typing import Tuple, Dict
+### Load example texts ###
+questions_texts = []
+with open("questions_texts.txt", "r") as file:
+  questions_texts = [line.strip() for line in file.readlines()]
+answers_texts = []
+with open("answers_texts.txt", "r") as file:
+  answers_texts = [line.strip() for line in file.readlines()]
+### Model and transforms preparation ###
+# Create model and tokenizer
+model, tokenizer = create_semantic_ranking_model()
+# Load saved weights
+model.load_state_dict(
+    torch.load(f="all-MiniLM-L6-v2.pth",
+               map_location=torch.device("cpu")) # load to CPU
+)
+# Load the embeddings
+with gzip.open('response_embeddings.pkl.gz', 'rb') as f:
+  response_embeddings = pickle.load(f)
+# Load the response list
+with gzip.open('response_list.pkl.gz', 'rb') as f:
+  response_list = pickle.load(f)
+### Predict function ###
+def predict(text) -> Tuple[Dict, float]:
+  # Start a timer
+  start_time = timer()
+  # Set the model to eval
+  model.eval()
+  # Set up the inputs
+  tokenized_inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
+  # Get input_embeddings
+  with torch.inference_mode():
+    input_embeddings = model(**tokenized_inputs)
+  # Compute similarity scores
+  similarity_scores = cosine_similarity(input_embeddings.unsqueeze(1), response_embeddings.unsqueeze(0), dim=2)
+  top_responses_indices = torch.topk(similarity_scores, k=5, dim=1).indices.squeeze()
+  # Retrieve the actual response texts
+  top_responses = [response_list[idx] for idx in top_responses_indices]
+  # Get actual response
+  actual_response = None
+  for question, answer in zip(questions_texts, answers_texts):
+    if text.strip() == question.strip():
+      actual_response = answer
+      break
+  # Calculate pred time
+  end_time = timer()
+  pred_time = round(end_time - start_time, 4)
+  # Return pred dict and pred time
+  return {"Top Responses": top_responses, "Actual Response": actual_response}, pred_time
+### 4. Gradio app ###
+# Create title, description and article
+title = "Semantic Ranking with MiniLM-L6-v2"
+description = "[A MiniLM-L6-H384-uncased MiniLM based model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) sentence embedding model trained to rank results from [HuggingFace 🤗 Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3). [Source Code Found Here](https://colab.research.google.com/drive/1o5a9zH1TxzaxLKV5AFUhZE8L8yMnO9Jw?usp=sharing)"
+article = "Built with [Gradio](https://github.com/gradio-app/gradio) and [PyTorch](https://pytorch.org/). [Source Code Found Here](https://colab.research.google.com/drive/1o5a9zH1TxzaxLKV5AFUhZE8L8yMnO9Jw?usp=sharing)"
+# Create the Gradio demo
+demo = gr.Interface(fn=predict,
+    inputs=gr.Textbox(lines=2, placeholder="Type your text here..."),
+    outputs=[gr.JSON(label="Top Responses"),
+             gr.Textbox(label="Actual Response", disabled=True),
+             gr.Number(label="Prediction time (s)")],
+    examples=example_texts,
+    title=title,
+    description=description,
+    article=article)
+# Launch the demo
+demo.launch()

model.py ADDED Viewed

	@@ -0,0 +1,41 @@

+import torch
+from torch import nn
+from transformers import AutoTokenizer, AutoModel
+class AutoModelForSentenceEmbedding(nn.Module):
+  def __init__(self, model):
+    super().__init__()
+    self.model = model
+  def forward(self, **kwargs)
+    model_output = self.model(**kwargs)
+    embeddings = self.mean_pooling(model_output, kwargs['attention_mask'])
+    embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
+    return embeddings
+  def mean_pooling(self, model_output, attention_mask):
+    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
+    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
+device = "cuda" if torch.cuda.is_available() else "cpu"
+def create_semantic_ranking_model(device=device):
+  """Creates a HuggingFace all-MiniLM-L6-v2 model.
+  Args:
+    device: A torch.device
+  Returns:
+    A tuple of the model and tokenizer
+  """
+  tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
+  model = AutoModelForSentenceEmbedding(AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')).to(device)
+  for param in model.model.parameters():
+    param.requires_grad = False
+  return model, tokenizer
+# Example usage
+model, tokenizer = create_semantic_ranking_model()

questions_texts.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+Formatting on computers For example , why ca n't Microsoft Word open a .odt file ? What happens when a file gets corrupted / unstable ? Why not have just one format for all videos and one format for all pictures ? Explain like I'm five.
+How do companies like apple keep track of laws in different countries ? in light of france suing apple due to a law in france that say that planned obsolescense is illegal , how can an international company keep track of every thing Explain like I'm five.
+Can animals commit suicide ? i always wondered if they can , like can animal become so sad that 's he eventually kill himself ? Explain like I'm five.
+the Monty Hall problem I think I get it , but then I ... don't . Why would it be advantageous to switch after one door is opened . Sure , it 's 50/50 instead of 1/3 , but it just does n't make sense to me . Explain like I'm five.
+Why ca n't we move our eyes independently ? How come both of our eyes move in the same direction ? Is this muscle related or nerve related ? Also is there any benefit to this or is it simply necessary for us . Please explain like I'm five.

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+torch==2.1.0
+torchvision==0.16.0
+gradio==3.50.2
+transformers==4.35.0