Spaces:
Runtime error
Runtime error
Catherine Breslin
commited on
Commit
·
37974d1
1
Parent(s):
8b28383
Smart Search Demo
Browse files- app.py +46 -0
- requirements.txt +6 -0
app.py
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import nltk
|
3 |
+
from transformers import pipeline
|
4 |
+
from sentence_transformers import SentenceTransformer
|
5 |
+
from scipy.spatial.distance import cosine
|
6 |
+
import numpy as np
|
7 |
+
|
8 |
+
st.header("Smart Search Demo")
|
9 |
+
st.markdown("This demo uses the sentence_transformers library to find close matches of search sentences in a long block of text. Change the sentences and try your own!")
|
10 |
+
|
11 |
+
# Streamlit text boxes
|
12 |
+
# source: https://www.gutenberg.org/cache/epub/1597/pg1597.txt
|
13 |
+
short_text = st.text_area('Enter search sentence(s):', value="The Prince wanted to marry a true Princess. True Princesses are very sensitive.")
|
14 |
+
long_text = st.text_area('Enter long block of text to search within:', value="There was once a Prince who wished to marry a Princess; but then she must be a real Princess. He travelled all over the world in hopes of finding such a lady; but there was always something wrong. Princesses he found in plenty; but whether they were real Princesses it was impossible for him to decide, for now one thing, now another, seemed to him not quite right about the ladies. At last he returned to his palace quite cast down, because he wished so much to have a real Princess for his wife.\nOne evening a fearful tempest arose, it thundered and lightened, and the rain poured down from the sky in torrents: besides, it was as dark as pitch. All at once there was heard a violent knocking at the door, and the old King, the Prince's father, went out himself to open it.\nIt was a Princess who was standing outside the door. What with the rain and the wind, she was in a sad condition; the water trickled down from her hair, and her clothes clung to her body. She said she was a real Princess.\n 'Ah! we shall soon see that!' thought the old Queen-mother; however, she said not a word of what she was going to do; but went quietly into the bedroom, took all the bed-clothes off the bed, and put three little peas on the bedstead. She then laid twenty mattresses one upon another over the three peas, and put twenty feather beds over the mattresses.\nUpon this bed the Princess was to pass the night.\nThe next morning she was asked how she had slept. 'Oh, very badly indeed!' she replied. 'I have scarcely closed my eyes the whole night through. I do not know what was in my bed, but I had something hard under me, and am all over black and blue. It has hurt me so much!'\nNow it was plain that the lady must be a real Princess, since she had been able to feel the three little peas through the twenty mattresses and twenty feather beds. None but a real Princess could have had such a delicate sense of feeling.\nThe Prince accordingly made her his wife; being now convinced that he had found a real Princess. The three peas were however put into the cabinet of curiosities, where they are still to be seen, provided they are not lost. \nWasn't this a lady of real delicacy?")
|
15 |
+
|
16 |
+
# Model setup
|
17 |
+
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
|
18 |
+
nltk.download('punkt')
|
19 |
+
|
20 |
+
# Run model
|
21 |
+
if short_text and long_text:
|
22 |
+
short_sentences = nltk.tokenize.sent_tokenize(short_text)
|
23 |
+
long_sentences = nltk.tokenize.sent_tokenize(long_text)
|
24 |
+
embed_short = model.encode(short_sentences)
|
25 |
+
embed_long = model.encode(long_sentences)
|
26 |
+
|
27 |
+
sim = np.zeros([len(embed_short), len(embed_long)])
|
28 |
+
for i,em in enumerate(embed_short):
|
29 |
+
for j,ea in enumerate(embed_long):
|
30 |
+
sim[i][j] = 1.0-cosine(em,ea)
|
31 |
+
|
32 |
+
# Sort similarities per input message
|
33 |
+
idx_sorted = np.zeros([len(embed_short), len(embed_long)])
|
34 |
+
for i,s in enumerate(sim):
|
35 |
+
idx_sorted[i] = np.argsort(s)
|
36 |
+
idx_sorted = np.fliplr(idx_sorted)
|
37 |
+
|
38 |
+
# Find top N matches
|
39 |
+
N=5
|
40 |
+
for i, m in enumerate(short_sentences):
|
41 |
+
st.markdown ("SEARCH SENTENCE: " + short_sentences[i])
|
42 |
+
st.markdown ("TOP MATCHES: ")
|
43 |
+
for j in range (0,N):
|
44 |
+
st.markdown ("* " + long_sentences[int(idx_sorted[i][j])])
|
45 |
+
st.markdown ("-----")
|
46 |
+
|
requirements.txt
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
torch
|
2 |
+
transformers
|
3 |
+
sentence_transformers
|
4 |
+
nltk
|
5 |
+
scipy
|
6 |
+
numpy
|