mavihsrr commited on
Commit
e5b5e02
1 Parent(s): fda6337

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,561 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:69227
8
+ - loss:CosineSimilarityLoss
9
+ base_model: BAAI/bge-small-en-v1.5
10
+ widget:
11
+ - source_sentence: Gliss Hair Repair Conditioner Color Protect & Shine. Description
12
+ :This conditioner is designed for long-lasting colour protection for your coloured
13
+ hair. The ultimate colour conditioner gives up to 10 weeks of colour protection
14
+ and intense luminosity. The effective formula with the repair serum and the UV
15
+ filter repairs the hair, seals and protects the colour perfectly from washing
16
+ out and fading. It provides optimal colour protection for coloured hair up to
17
+ 10 weeks with regular use.Hate hair drama? Then try Gliss Hair Repair products
18
+ for beautiful, restored and healthy-looking hair. GLISS Hair Repair products with
19
+ breakthrough patented Hair-Identical Keratin leverage technology to up to 10 layers
20
+ deep. Combined with essential benefits like colour protection, intense hydration,
21
+ long-lasting volume, and weightless nourishment, you get the repair you need without
22
+ having to compromise.!
23
+ sentences:
24
+ - Dairy Milk Honeycomb & Nuts - Imported. Description :Cadbury dairy milk is all
25
+ about regaling in the richness and creaminess of these classic chocolate bars.
26
+ These chocolate bars are available in a number of diverse flavours that offer
27
+ you a reason to celebrate every small and big occasion of happiness.!
28
+ - 'product
29
+
30
+ Bio Flame Of The Forest - Fresh Shine Expertise Oil Bio Flame Of The Forest
31
+ - Fresh Shine Expertis...
32
+
33
+ Bio Flame Of The Forest - Fresh Shine Expertise Oil Bio Flame Of The Forest
34
+ - Fresh Shine Expertis...
35
+
36
+ Name: combined, dtype: object'
37
+ - Hygiene Hand Wipes With Anti-bacterial Actives- Skin-Friendly. Description :Have
38
+ you stepped out of your house and wondered if the door that you just pushed open,
39
+ was clean? Are there germs lurking around you, that you wish you could see better?
40
+ Have you wondered if you have been careful in ensuring the best protection against
41
+ bacteria and germs? Is your personal hygiene standard good enough? Personal Hygiene
42
+ is in your hands. Literally. KeepSafe by Marico takes care of your Hygiene needs
43
+ through its range of premium quality sanitizer, disinfectants, wipes, hand wash
44
+ and personal hygiene products.KeepSafe Hygiene Hand Wipes are rich in anti-bacterial
45
+ actives that sanitise and effectively fight germs. These wipes are rich in Aloe
46
+ Vera and Glycerine and are mild and soothing on the skin. These hygiene wipes
47
+ are so soft, that you can use them every day, as many times as you want. Like
48
+ a true Marico product, KeepSafe believes in transparency, superior quality and
49
+ complete essential care. Try out the Multi-purpose Disinfectant and the Instant
50
+ Hand Sanitiser from KeepSafe by Marico range for complete out-of-home hygiene.
51
+ Take No Chances. Keep Safe.!
52
+ - source_sentence: Fragrance Body Spray For Men (1000 sprays) - Forever. Description
53
+ :Soothing experience throughout the day, Consists of refreshing & Long lasting
54
+ fragrance. For Beauty tips, tricks & more visit https://bigbasket.blog/!
55
+ sentences:
56
+ - M2 Perfume Spray - for Men. Description :Engage Perfume Sprays created by International
57
+ Experts For Beauty tips, tricks & more visit http://lookbeautiful.in/ For Beauty
58
+ tips, tricks & more visit https://bigbasket.blog/!
59
+ - 'product
60
+
61
+ Dazzle Opalware Noodle Bowl Set - Tropical Lagoon Dazzle Opalware Noodle Bowl
62
+ Set - Tropical Lag...
63
+
64
+ Dazzle Opalware Noodle Bowl Set - Tropical Lagoon Dazzle Opalware Noodle Bowl
65
+ Set - Tropical Lag...
66
+
67
+ Name: combined, dtype: object'
68
+ - 'product
69
+
70
+ Hakka Noodles - Veg Hakka Noodles - Veg. Description :Ching''s Secr...
71
+
72
+ Hakka Noodles - Veg Hakka Noodles - Veg. Description :It is ready ...
73
+
74
+ Name: combined, dtype: object'
75
+ - source_sentence: Amlant Ayurvedic Medicine For Acidity
76
+ sentences:
77
+ - Grapes - Thompson Seedless
78
+ - 'Ice Cream Bowl. Description :Excellent quality crystal clear glass
79
+
80
+ Easy to handle
81
+
82
+ Ideal for gifting
83
+
84
+ Dishwasher safe
85
+
86
+ This glass is made from high-quality material & crafted in a new design for easy
87
+ handling.!'
88
+ - Melamine Snack Set - Red. Description :Made of 100% food-grade melamine and food
89
+ contact grade colour, this snack set is heat resistant up to a temp of 140º C.
90
+ It is resistant to breaking, cracking & chipping. Stain-proof, it comes with long-lasting
91
+ designs. It has a glazed finish that makes it aesthetically pleasing. This snack
92
+ set is safe for dishwasher use.!
93
+ - source_sentence: Wonder Pants - Small, Combo. Description :Your baby spends a good
94
+ part of their day in a diaper. Therefore, choosing the right diaper for their
95
+ tender and delicate skin is extremely important. And this is where, we introduce
96
+ our next-generation product, India's 1st diaper pants with the unique Bubble-Bed
97
+ technology. There are 3 areas where a diaper surrounds the baby's skin-the baby's
98
+ bottom, the baby's waist, and the baby's thigh. The skin of the baby in all these
99
+ areas is extremely delicate and sensitiveHuggies Wonder Pants Diapers Small Size
100
+ pack with 3D Bubble bed technology with a Cushiony Waistband.!
101
+ sentences:
102
+ - Home Mate Garbage Bag - Green, Oxo-Bio-Degradable Roll, 30X37, 50 Micron. Description
103
+ :These garbage bags are designed to ease the task of garbage disposal, and the
104
+ bio-degradable material, makes it environment friendly and helps spread the word
105
+ of hygiene and cleanliness. They are strong enough to carry waste neatly without
106
+ causing a mess, and large enough to carry it all at once. They also offer great
107
+ flexibility, convenience and ensure a high degree of hygiene, whether at home
108
+ or in office.!
109
+ - 'product
110
+
111
+ Baby Diapers & Sanitary Disposal Bag Baby Diapers & Sanitary Disposal Bag.
112
+ Descript...
113
+
114
+ Baby Diapers & Sanitary Disposal Bag Baby Diapers & Sanitary Disposal Bag.
115
+ Descript...
116
+
117
+ Name: combined, dtype: object'
118
+ - 'product
119
+
120
+ Organic - Sugar/Sakkare Brown Organic - Sugar/Sakkare Brown. Description :Pu...
121
+
122
+ Organic - Sugar/Sakkare Brown Organic - Sugar/Sakkare Brown. Description :Tu...
123
+
124
+ Name: combined, dtype: object'
125
+ - source_sentence: Coffee Filter Papers - Size 02, White. Description :Hario brings
126
+ in Cone-shaped natural paper filter for Pour-over brewing experience for a great
127
+ cup of Coffee. Hario's V60, size 02 White, give you a perfect brew in comparison
128
+ to mesh filters. These paper filters are of great quality and they produce a clean,
129
+ flavorful, sediment-free cup. They are disposable, and thus it makes it convenient
130
+ and easier to use for brewing and cleanup. Perfect choice for coffee enthusiasts
131
+ who like to grind their coffee at home. These papers are safe to use and eco-friendly.
132
+ The Box comes with 100 disposable 02 paper filters.!
133
+ sentences:
134
+ - Tomato Disc 70 g + Cheese Balls 70 g
135
+ - 'product
136
+
137
+ 4mm Aluminium Induction Base Chapati Roti Tawa - Silver 4mm Aluminium Induction
138
+ Base Chapati Roti Tawa...
139
+
140
+ 4mm Aluminium Induction Base Chapati Roti Tawa - Silver 4mm Aluminium Induction
141
+ Base Chapati Roti Tawa...
142
+
143
+ Name: combined, dtype: object'
144
+ - Steel Rice Serving Spoon - Medium, Classic Diana Series, BBST37. Description :BB
145
+ Home provides fine and classy cooking and serving tools that can make difference
146
+ to your kitchen experience. These cooking/serving tools are made from 100% food
147
+ grade stainless steel. The handle is designed in a way so it does not feel heavy
148
+ while cooking/serving. It is easy to store as it has a bottom hole on the handle
149
+ to hang it on the wall.!
150
+ pipeline_tag: sentence-similarity
151
+ library_name: sentence-transformers
152
+ metrics:
153
+ - pearson_cosine
154
+ - spearman_cosine
155
+ model-index:
156
+ - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
157
+ results:
158
+ - task:
159
+ type: semantic-similarity
160
+ name: Semantic Similarity
161
+ dataset:
162
+ name: bge eval
163
+ type: bge-eval
164
+ metrics:
165
+ - type: pearson_cosine
166
+ value: 0.9791486195203369
167
+ name: Pearson Cosine
168
+ - type: spearman_cosine
169
+ value: 0.15795715637146185
170
+ name: Spearman Cosine
171
+ - type: pearson_cosine
172
+ value: 0.9798210832808076
173
+ name: Pearson Cosine
174
+ - type: spearman_cosine
175
+ value: 0.1632937701650559
176
+ name: Spearman Cosine
177
+ ---
178
+
179
+ # SentenceTransformer based on BAAI/bge-small-en-v1.5
180
+
181
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
182
+
183
+ ## Model Details
184
+
185
+ ### Model Description
186
+ - **Model Type:** Sentence Transformer
187
+ - **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <!-- at revision 5c38ec7c405ec4b44b94cc5a9bb96e735b38267a -->
188
+ - **Maximum Sequence Length:** 512 tokens
189
+ - **Output Dimensionality:** 384 dimensions
190
+ - **Similarity Function:** Cosine Similarity
191
+ <!-- - **Training Dataset:** Unknown -->
192
+ <!-- - **Language:** Unknown -->
193
+ <!-- - **License:** Unknown -->
194
+
195
+ ### Model Sources
196
+
197
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
198
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
199
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
200
+
201
+ ### Full Model Architecture
202
+
203
+ ```
204
+ SentenceTransformer(
205
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
206
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
207
+ (2): Normalize()
208
+ )
209
+ ```
210
+
211
+ ## Usage
212
+
213
+ ### Direct Usage (Sentence Transformers)
214
+
215
+ First install the Sentence Transformers library:
216
+
217
+ ```bash
218
+ pip install -U sentence-transformers
219
+ ```
220
+
221
+ Then you can load this model and run inference.
222
+ ```python
223
+ from sentence_transformers import SentenceTransformer
224
+
225
+ # Download from the 🤗 Hub
226
+ model = SentenceTransformer("mavihsrr/bgeEmbeddingsRetailedFT")
227
+ # Run inference
228
+ sentences = [
229
+ "Coffee Filter Papers - Size 02, White. Description :Hario brings in Cone-shaped natural paper filter for Pour-over brewing experience for a great cup of Coffee. Hario's V60, size 02 White, give you a perfect brew in comparison to mesh filters. These paper filters are of great quality and they produce a clean, flavorful, sediment-free cup. They are disposable, and thus it makes it convenient and easier to use for brewing and cleanup. Perfect choice for coffee enthusiasts who like to grind their coffee at home. These papers are safe to use and eco-friendly. The Box comes with 100 disposable 02 paper filters.!",
230
+ 'Steel Rice Serving Spoon - Medium, Classic Diana Series, BBST37. Description :BB Home provides fine and classy cooking and serving tools that can make difference to your kitchen experience. These cooking/serving tools are made from 100% food grade stainless steel. The handle is designed in a way so it does not feel heavy while cooking/serving. It is easy to store as it has a bottom hole on the handle to hang it on the wall.!',
231
+ 'Tomato Disc 70 g + Cheese Balls 70 g',
232
+ ]
233
+ embeddings = model.encode(sentences)
234
+ print(embeddings.shape)
235
+ # [3, 384]
236
+
237
+ # Get the similarity scores for the embeddings
238
+ similarities = model.similarity(embeddings, embeddings)
239
+ print(similarities.shape)
240
+ # [3, 3]
241
+ ```
242
+
243
+ <!--
244
+ ### Direct Usage (Transformers)
245
+
246
+ <details><summary>Click to see the direct usage in Transformers</summary>
247
+
248
+ </details>
249
+ -->
250
+
251
+ <!--
252
+ ### Downstream Usage (Sentence Transformers)
253
+
254
+ You can finetune this model on your own dataset.
255
+
256
+ <details><summary>Click to expand</summary>
257
+
258
+ </details>
259
+ -->
260
+
261
+ <!--
262
+ ### Out-of-Scope Use
263
+
264
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
265
+ -->
266
+
267
+ ## Evaluation
268
+
269
+ ### Metrics
270
+
271
+ #### Semantic Similarity
272
+
273
+ * Dataset: `bge-eval`
274
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
275
+
276
+ | Metric | Value |
277
+ |:--------------------|:----------|
278
+ | pearson_cosine | 0.9791 |
279
+ | **spearman_cosine** | **0.158** |
280
+
281
+ #### Semantic Similarity
282
+
283
+ * Dataset: `bge-eval`
284
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
285
+
286
+ | Metric | Value |
287
+ |:--------------------|:-----------|
288
+ | pearson_cosine | 0.9798 |
289
+ | **spearman_cosine** | **0.1633** |
290
+
291
+ <!--
292
+ ## Bias, Risks and Limitations
293
+
294
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
295
+ -->
296
+
297
+ <!--
298
+ ### Recommendations
299
+
300
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
301
+ -->
302
+
303
+ ## Training Details
304
+
305
+ ### Training Dataset
306
+
307
+ #### Unnamed Dataset
308
+
309
+
310
+ * Size: 69,227 training samples
311
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
312
+ * Approximate statistics based on the first 1000 samples:
313
+ | | sentence1 | sentence2 | score |
314
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------|
315
+ | type | string | string | float |
316
+ | details | <ul><li>min: 4 tokens</li><li>mean: 114.97 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 101.87 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 0.18</li><li>mean: 0.88</li><li>max: 0.96</li></ul> |
317
+ * Samples:
318
+ | sentence1 | sentence2 | score |
319
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
320
+ | <code>Breakfast Mix - Masala Idli. Description :Established in 1924, MTR is the contemporary way to authentic tasting food, Our products are backed by culinary expertise honed, over 8 decades of serving wholesome, tasty and high quality vegetarian food, Using authentic Indian recipes, the purest and best quality natural ingredients and traditional methods of preparation, We brings you a range of products of unmatched flavour and taste, to delight your family at every meal and every occasion, MTR Daily Favourites is your dependable partner in the Kitchen that helps you make your family's everyday meals tasty and wholesome, So bring home the confidence of great tasting food everyday with MTR..!</code> | <code>Quinoa Flakes. Description :Keep a good balance of satisfying your taste buds and satiating your hunger pangs. Nutriwish Quinoa Flakes are a “complete” protein containing all eight essential amino acids. The perfect antidote to all that sugar, Nutriwish Quinoa Flakes are delicious cold in a salad, served warm as a side dish or even combined with vegetables and dairy to make a spectacular and filling vegetarian main course. Curb food cravings and start your day yummy with the starchy Nutriwish Quinoa Flakes.!</code> | <code>0.9524586385560029</code> |
321
+ | <code>1 To 1 Baking Flour - Gluten Free. Description :Bob Red Mill gluten-free 1-to-1 baking flour makes it easy to transform traditional recipes to gluten-free. Simply follow your favourite baking recipe, replacing the wheat flour with this blend. It is formulated for baked goods with terrific taste and texture, no additional speciality ingredients or recipes required. It is suitable for cookies, cakes, brownies, muffins, and more.!</code> | <code>Chocolate - Drink Powder. Description :Hintz cocoa powder is not just ideal for making biscuits, ice cream and deserts. It is also dissolved in hot milk - a delicious chocolate beverage.!</code> | <code>0.8764388983469142</code> |
322
+ | <code>Joy Round Kids Glass. Description :This glass, made of plastic material, is specially designed for your kid. It is lightweight and easy to use. This glass is ideal for drinking water, milk, juices, health drinks etc.!</code> | <code>Plastic Lunch Box/Tiffin Box - Disney Mickey Mouse, BPA Free, HMHILB 199-MK. Description :HMI brings this 4 side lock and lock style. This is airtight, leak-proof and microwave safe. It comes with a small container, fork & spoon.!</code> | <code>0.9289614489097255</code> |
323
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
324
+ ```json
325
+ {
326
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
327
+ }
328
+ ```
329
+
330
+ ### Evaluation Dataset
331
+
332
+ #### Unnamed Dataset
333
+
334
+
335
+ * Size: 8,654 evaluation samples
336
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
337
+ * Approximate statistics based on the first 1000 samples:
338
+ | | sentence1 | sentence2 | score |
339
+ |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------|
340
+ | type | string | string | float |
341
+ | details | <ul><li>min: 4 tokens</li><li>mean: 110.58 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 97.13 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 0.19</li><li>mean: 0.87</li><li>max: 0.96</li></ul> |
342
+ * Samples:
343
+ | sentence1 | sentence2 | score |
344
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
345
+ | <code>1947 Flora Natural Aroma Incense Sticks - Economy Pack. Description :A Traditional formula that is handed over by the founder, incense sticks is made the traditional way with a ‘masala’ or mixture of 100% natural aromatic botanicals. During your rituals, these incense sticks will bring about a fresh and fragrant breath of conscious soothing bliss.!</code> | <code>Designer Jyot - Green. Description :This is made in India Initiative and create a meditative and peaceful ambience in your puja room with the handmade Brass Mandir Jyot. It extremely durable and crack-resistant, which allows you to use it with ease on a daily basis. This Jyot is very attractive and worth purchasing for personal use or for gifting purpose. Easy to Use and Clean. This Glass brass diya is designed for ease in inserting whip, refilling oil and cleaning. It emits brighter light due to the increased clarity provided by the superior quality glass. The flame of this brass diya does not go off or cause any danger even when the fan is on as the diya comes with a lid.!</code> | <code>0.9030882765047124</code> |
346
+ | <code>Mexican Seasoning. Description :The rich tapestry of sweet and spicy flavours that Mexican cuisine is loved for - now captured in a magic blend. This international seasoning product is inbuilt with unique 2-way flip cap to sprinkle it or scoop it. On1y is a new way of rediscovering the power of herbs and spices. On1y can conveniently become a part of your daily diet for the irresistible benefits that it brings.!</code> | <code>Rainbow Strands. Description :Colourful jimmies/sprinkles make decorating your cakes, cupcakes and cookies fun and easy. Great as an ice cream topping too.!</code> | <code>0.9584305870004965</code> |
347
+ | <code>Intense 75% Dark Chocolate. Description :This pack has 100gm 75% Luxury Intense Dark Chocolate. With meticulous culinary skills the exotic intense bitterness of cacao beans emerges in this bar. Chocolate was invented in 1900 BC by the Aztecs in Central America. We at Didier & Frank bring you those exotic flavours and hand crafted chocolates that the Aztecs enjoyed secretly. Today, Didier & Frank makes the best chocolates in the world.!</code> | <code>Puff Pastry Sticks With Butter. Description :The unique and timeless original Classic Millefoglie by Matilde Vicenzi: crumbly sticks of delicate pastry typical of the Italian tradition, with all the flavour of butter. With 192 crispy and delicate layers of puff pastry and just a light layer of premium butter, our inimitable Millefoglie d’Italia are among the most popular desserts in Italy.!</code> | <code>0.9553127949715517</code> |
348
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
349
+ ```json
350
+ {
351
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
352
+ }
353
+ ```
354
+
355
+ ### Training Hyperparameters
356
+ #### Non-Default Hyperparameters
357
+
358
+ - `eval_strategy`: steps
359
+ - `per_device_train_batch_size`: 16
360
+ - `per_device_eval_batch_size`: 16
361
+ - `learning_rate`: 2e-05
362
+ - `num_train_epochs`: 1
363
+ - `warmup_ratio`: 0.1
364
+ - `bf16`: True
365
+ - `batch_sampler`: no_duplicates
366
+
367
+ #### All Hyperparameters
368
+ <details><summary>Click to expand</summary>
369
+
370
+ - `overwrite_output_dir`: False
371
+ - `do_predict`: False
372
+ - `eval_strategy`: steps
373
+ - `prediction_loss_only`: True
374
+ - `per_device_train_batch_size`: 16
375
+ - `per_device_eval_batch_size`: 16
376
+ - `per_gpu_train_batch_size`: None
377
+ - `per_gpu_eval_batch_size`: None
378
+ - `gradient_accumulation_steps`: 1
379
+ - `eval_accumulation_steps`: None
380
+ - `torch_empty_cache_steps`: None
381
+ - `learning_rate`: 2e-05
382
+ - `weight_decay`: 0.0
383
+ - `adam_beta1`: 0.9
384
+ - `adam_beta2`: 0.999
385
+ - `adam_epsilon`: 1e-08
386
+ - `max_grad_norm`: 1.0
387
+ - `num_train_epochs`: 1
388
+ - `max_steps`: -1
389
+ - `lr_scheduler_type`: linear
390
+ - `lr_scheduler_kwargs`: {}
391
+ - `warmup_ratio`: 0.1
392
+ - `warmup_steps`: 0
393
+ - `log_level`: passive
394
+ - `log_level_replica`: warning
395
+ - `log_on_each_node`: True
396
+ - `logging_nan_inf_filter`: True
397
+ - `save_safetensors`: True
398
+ - `save_on_each_node`: False
399
+ - `save_only_model`: False
400
+ - `restore_callback_states_from_checkpoint`: False
401
+ - `no_cuda`: False
402
+ - `use_cpu`: False
403
+ - `use_mps_device`: False
404
+ - `seed`: 42
405
+ - `data_seed`: None
406
+ - `jit_mode_eval`: False
407
+ - `use_ipex`: False
408
+ - `bf16`: True
409
+ - `fp16`: False
410
+ - `fp16_opt_level`: O1
411
+ - `half_precision_backend`: auto
412
+ - `bf16_full_eval`: False
413
+ - `fp16_full_eval`: False
414
+ - `tf32`: None
415
+ - `local_rank`: 0
416
+ - `ddp_backend`: None
417
+ - `tpu_num_cores`: None
418
+ - `tpu_metrics_debug`: False
419
+ - `debug`: []
420
+ - `dataloader_drop_last`: False
421
+ - `dataloader_num_workers`: 0
422
+ - `dataloader_prefetch_factor`: None
423
+ - `past_index`: -1
424
+ - `disable_tqdm`: False
425
+ - `remove_unused_columns`: True
426
+ - `label_names`: None
427
+ - `load_best_model_at_end`: False
428
+ - `ignore_data_skip`: False
429
+ - `fsdp`: []
430
+ - `fsdp_min_num_params`: 0
431
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
432
+ - `fsdp_transformer_layer_cls_to_wrap`: None
433
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
434
+ - `deepspeed`: None
435
+ - `label_smoothing_factor`: 0.0
436
+ - `optim`: adamw_torch
437
+ - `optim_args`: None
438
+ - `adafactor`: False
439
+ - `group_by_length`: False
440
+ - `length_column_name`: length
441
+ - `ddp_find_unused_parameters`: None
442
+ - `ddp_bucket_cap_mb`: None
443
+ - `ddp_broadcast_buffers`: False
444
+ - `dataloader_pin_memory`: True
445
+ - `dataloader_persistent_workers`: False
446
+ - `skip_memory_metrics`: True
447
+ - `use_legacy_prediction_loop`: False
448
+ - `push_to_hub`: False
449
+ - `resume_from_checkpoint`: None
450
+ - `hub_model_id`: None
451
+ - `hub_strategy`: every_save
452
+ - `hub_private_repo`: None
453
+ - `hub_always_push`: False
454
+ - `gradient_checkpointing`: False
455
+ - `gradient_checkpointing_kwargs`: None
456
+ - `include_inputs_for_metrics`: False
457
+ - `include_for_metrics`: []
458
+ - `eval_do_concat_batches`: True
459
+ - `fp16_backend`: auto
460
+ - `push_to_hub_model_id`: None
461
+ - `push_to_hub_organization`: None
462
+ - `mp_parameters`:
463
+ - `auto_find_batch_size`: False
464
+ - `full_determinism`: False
465
+ - `torchdynamo`: None
466
+ - `ray_scope`: last
467
+ - `ddp_timeout`: 1800
468
+ - `torch_compile`: False
469
+ - `torch_compile_backend`: None
470
+ - `torch_compile_mode`: None
471
+ - `dispatch_batches`: None
472
+ - `split_batches`: None
473
+ - `include_tokens_per_second`: False
474
+ - `include_num_input_tokens_seen`: False
475
+ - `neftune_noise_alpha`: None
476
+ - `optim_target_modules`: None
477
+ - `batch_eval_metrics`: False
478
+ - `eval_on_start`: False
479
+ - `use_liger_kernel`: False
480
+ - `eval_use_gather_object`: False
481
+ - `average_tokens_across_devices`: False
482
+ - `prompts`: None
483
+ - `batch_sampler`: no_duplicates
484
+ - `multi_dataset_batch_sampler`: proportional
485
+
486
+ </details>
487
+
488
+ ### Training Logs
489
+ | Epoch | Step | Training Loss | Validation Loss | bge-eval_spearman_cosine |
490
+ |:------:|:----:|:-------------:|:---------------:|:------------------------:|
491
+ | 0 | 0 | - | - | 0.0923 |
492
+ | 0.0231 | 100 | 0.0657 | 0.0386 | 0.1450 |
493
+ | 0.0462 | 200 | 0.0248 | 0.0133 | 0.1661 |
494
+ | 0.0693 | 300 | 0.0118 | - | - |
495
+ | 0.0231 | 100 | 0.0069 | 0.0070 | 0.1644 |
496
+ | 0.0462 | 200 | 0.0037 | 0.0040 | 0.1634 |
497
+ | 0.0693 | 300 | 0.0016 | 0.0038 | 0.1619 |
498
+ | 0.0924 | 400 | 0.0013 | 0.0042 | 0.1603 |
499
+ | 0.1156 | 500 | 0.0011 | 0.0049 | 0.1579 |
500
+ | 0.1387 | 600 | 0.0012 | 0.0052 | 0.1593 |
501
+ | 0.1618 | 700 | 0.0011 | 0.0053 | 0.1608 |
502
+ | 0.1849 | 800 | 0.0011 | 0.0055 | 0.1612 |
503
+ | 0.2080 | 900 | 0.0011 | 0.0063 | 0.1606 |
504
+ | 0.2311 | 1000 | 0.0011 | 0.0061 | 0.1585 |
505
+ | 0.2542 | 1100 | 0.0012 | 0.0061 | 0.1566 |
506
+ | 0.2773 | 1200 | 0.0011 | 0.0062 | 0.1557 |
507
+ | 0.3004 | 1300 | 0.0012 | 0.0062 | 0.1570 |
508
+ | 0.3235 | 1400 | 0.001 | 0.0058 | 0.1557 |
509
+ | 0.3467 | 1500 | 0.001 | 0.0063 | 0.1554 |
510
+ | 0.3698 | 1600 | 0.0011 | 0.0062 | 0.1572 |
511
+ | 0.3929 | 1700 | 0.0011 | 0.0061 | 0.1580 |
512
+ | 0.4160 | 1800 | 0.001 | - | 0.1598 |
513
+ | 0.2311 | 1000 | 0.0008 | 0.0063 | 0.1532 |
514
+ | 0.4622 | 2000 | 0.0008 | 0.0064 | 0.1651 |
515
+ | 0.6933 | 3000 | 0.001 | 0.0067 | 0.1627 |
516
+ | 0.9244 | 4000 | 0.001 | 0.0067 | 0.1633 |
517
+
518
+
519
+ ### Framework Versions
520
+ - Python: 3.10.12
521
+ - Sentence Transformers: 3.3.1
522
+ - Transformers: 4.47.1
523
+ - PyTorch: 2.1.0+cu118
524
+ - Accelerate: 1.2.1
525
+ - Datasets: 3.2.0
526
+ - Tokenizers: 0.21.0
527
+
528
+ ## Citation
529
+
530
+ ### BibTeX
531
+
532
+ #### Sentence Transformers
533
+ ```bibtex
534
+ @inproceedings{reimers-2019-sentence-bert,
535
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
536
+ author = "Reimers, Nils and Gurevych, Iryna",
537
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
538
+ month = "11",
539
+ year = "2019",
540
+ publisher = "Association for Computational Linguistics",
541
+ url = "https://arxiv.org/abs/1908.10084",
542
+ }
543
+ ```
544
+
545
+ <!--
546
+ ## Glossary
547
+
548
+ *Clearly define terms in order to be accessible across audiences.*
549
+ -->
550
+
551
+ <!--
552
+ ## Model Card Authors
553
+
554
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
555
+ -->
556
+
557
+ <!--
558
+ ## Model Card Contact
559
+
560
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
561
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-small-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.47.1",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.1.0+cu118"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c830858b66dbefec3cec099c7b8130e2c7c01a5039dd5f69b6d2d4961c38bb0a
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff