File size: 15,792 Bytes
d7b6e9e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dfe5c85
 
d7b6e9e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
base_model: dunzhang/stella_en_1.5B_v5
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:99000
- loss:MultipleNegativesSymmetricRankingLoss
widget:
- source_sentence: 'Instruct: Given a web search query, retrieve relevant passages
    that answer the query.

    Query: Glay'
  sentences:
  - The Theory of Good and Evil is a 1907 book about ethics by the English philosopher
    Hastings Rashdall, in which the author expounds a theory he calls "ideal utilitarianism".
    It has been seen as Rashdall's most important philosophical work.
  - GLAY is a Japanese rock band , formed in Hakodate in 1988 . Glay primarily composes
    songs in the rock and pop genres , but they have also arranged songs using elements
    from a wide variety of genres , including punk , electronic , R&B , progressive
    rock , folk , reggae , gospel , and ska . Originally a visual kei band , the group
    slowly shifted to less dramatic attire through the years . As of 2008 , Glay had
    sold an estimated 51 million records ; 28 million singles and 23 million albums
    , making them one of the top ten best-selling artists of all time in Japan .
  - Aashirwad is a 1968 Bollywood film , directed by Hrishikesh Mukherjee . The film
    stars Ashok Kumar and Sanjeev Kumar .   The film is notable for its inclusion
    of a rap-like song performed by Ashok Kumar , `` Rail Gaadi '' .
- source_sentence: 'Instruct: Given a web search query, retrieve relevant passages
    that answer the query.

    Query: Indexing does not work with index package'
  sentences:
  - 'I am trying to do indexing with the following code:               \documentclass[a4paper]{article}     \usepackage{index}     \makeindex     \newindex{aut}{adx}{and}{Name
    Index}     \begin{document}     Hellow \index[aut]{FiRST}     \printindex[aut]     \end{document}      Acccording
    to documention of the `index` package it should work. But makeindex creates empty
    `.idx` and `.ind`. If I run code like this:               \documentclass[a4paper]{article}     \usepackage{index}     \makeindex     \begin{document}      Hellow
    \index{FiRST}     \printindex     \end{document}      It runs. But I need to have
    user-defined index. Please help me with it. I''ve searched for several hours on
    internet, but without success.'
  - 'Body materials may include, but are not limited to, any of these materials:'
  - Berberis aemulans is a shrub endemic to the region of Sichuan in southern China.
    It grows there in thickets and on slopes at elevations of 2900-3200 m.Berberis
    aemulans is a deciduous shrub up to 2 m tall, with spines along the branches.
    Leaves are simple, elliptical to ovate, up to 4 cm long, lighter in color on the
    underside because of a waxy layer. Flowers are in simple racemes of only a few
    flowers. Berries egg-shaped, orange, up to 16 mm long.
- source_sentence: 'Instruct: Given a web search query, retrieve relevant passages
    that answer the query.

    Query: Parodi''s hemispingus'
  sentences:
  - Another event dubbed a "Battle of the Sexes" took place during the 1998 Australian
    Open[51] between Karsten Braasch and the Williams sisters. Venus and Serena Williams
    had claimed that they could beat any male player ranked outside the world's top
    200, so Braasch, then ranked 203rd, challenged them both. Braasch was described
    by one journalist as "a man whose training regime centered around a pack of cigarettes
    and more than a couple bottles of ice cold lager".[52][51] The matches took place
    on court number 12 in Melbourne Park,[53] after Braasch had finished a round of
    golf and two shandies. He first took on Serena and after leading 5–0, beat her
    6–1. Venus then walked on court and again Braasch was victorious, this time winning
    6–2.[54] Braasch said afterwards, "500 and above, no chance". He added that he
    had played like someone ranked 600th in order to keep the game "fun".[55] Braasch
    said the big difference was that men can chase down shots much easier, and that
    men put spin on the ball that the women can't handle. The Williams sisters adjusted
    their claim to beating men outside the top 350.[51]
  - The Parodi 's hemispingus ( Hemispingus parodii ) is a species of bird in the
    family Thraupidae that is endemic to Peru .   Its natural habitat is subtropical
    or tropical moist montane forests .
  - 'I need help because my Minecraft launcher doesn''t work... It''s been a long
    time I haven''t played Minecraft and until now it worked nicely. But now that
    I want to play on it again and I run the launcher, this appears (click images
    to enlarge): ![enter image description here](http://i.stack.imgur.com/hvD9R.png)
    At the bottom left of the screen the profile names keep loading (normally my username
    appears in the box) and as you can see I am unable to click on the "Play" button.
    I tried creating another profile but it doesn''t work because soon after they
    ask to enter my Minecraft username and password. The password I entered disappears
    and it keeps loading (I''ve tried waiting like, 30 minutes and it still doesn''t
    work) so this is definitely not normal. ![enter image description here](http://i.stack.imgur.com/yDYjX.png)
    ![enter image description here](http://i.stack.imgur.com/4Nf1L.png) ![enter image
    description here](http://i.stack.imgur.com/T6cJu.png) So basically I can''t play
    on Minecraft anymore (version 1.7.9)... P.S. I use Windows 7.'
- source_sentence: 'Instruct: Given a web search query, retrieve relevant passages
    that answer the query.

    Query: Mahabharata'
  sentences:
  - The epic employs the story within a story structure, otherwise known as frametales,
    popular in many Indian religious and non-religious works. It is first recited
    at Takshashila by the sage Vaiśampāyana,[12][13] a disciple of Vyāsa, to the King
    Janamejaya who is the great-grandson of the Pāṇḍava prince Arjuna. The story is
    then recited again by a professional storyteller named Ugraśrava Sauti, many years
    later, to an assemblage of sages performing the 12-year sacrifice for the king
    Saunaka Kulapati in the Naimiśa Forest.
  - 'Guncati (Serbian Cyrillic: Гунцати) is a suburban settlement of Belgrade, the
    capital of Serbia. It is located in the municipality of Barajevo.Guncati is located
    west of the municipal seat of Barajevo, halfway between the Belgrade-Bar railway
    and Ibarska magistrala (Highway of Ibar).It is a rural settlement with a steady
    population growth: from 1,718 (Census 1991) to 2,102 (Census 2002).'
  - Beck 's Brewery , also known as Brauerei Beck & Co. , is a brewery in the northern
    German city of Bremen . In 2001 , Interbrew agreed to buy Brauerei Beck for 1.8
    billion euro ; at that time it was the fourth largest brewer in Germany . US manufacture
    of Beck 's Brew has been based in St. Louis , Missouri , since early 2012 but
    some customers have rebelled against the US market version .   Since 2008 , it
    has been owned by the Interbrew subsidiary of Anheuser-Busch InBev SA/NV .   The
    Beck 's Art Label Campaign has offered artists the opportunity to provide designs
    to replace the brand 's label . It started in London in 1987 with Gilbert and
    George . The artists created an art label , because Beck 's sponsored their retrospective
    at the Hayward Gallery . The labels of the 2000 limited edition Beck 's bottles
    were matching their exhibition poster . Other participants of the Art Label Campaign
    are members of the loose group `` Young British Artists '' and nominees or winners
    of the Turner Prize . Damien Hirst for example , designed a label for Beck 's
    in 1995 , showing his famous spots . In 2000 , Tracey Emin created a label , which
    shows herself , posing in a bathtub . Furthermore , Rachel Whiteread designed
    a label in 1993 , presenting her artwork `` house '' , which was also financed
    by Beck 's . The Art Label Campaign has also been parodied by Matthew Higgs ,
    who is a member of the British art collective `` Bank '' . In the Bank exhibition
    `` The Charge of the Light Brigade '' in 1995 , he brewed a beer , called `` Kunstlerbrau
    '' . In 2012 , Beck 's started giving young and independent musicians the opportunity
    to design a label for the Beck 's bottle . Beck 's summer 2009 limited-edition
    labels were designed by the musical groups Hard-Fi and Ladyhawke .
- source_sentence: 'Instruct: Given a web search query, retrieve relevant passages
    that answer the query.

    Query: Ahu A Umi Heiau'
  sentences:
  - The 1967 All-Ireland Intermediate Hurling Championship was the seventh staging
    of the All-Ireland hurling championship. The championship ended on 17 September
    1967.Tipperary were the defending champions, however, they were defeated in the
    provincial championship. London won the title after defeating Cork by 1-9 to 1-5
    in the final.
  - 'The digit ratio is the ratio of the lengths of different digits or fingers typically
    measured from the midpoint of bottom crease ( where the finger joins the hand
    ) to the tip of the finger . It has been suggested by some scientists that the
    ratio of two digits in particular , the 2nd ( index finger ) and 4th ( ring finger
    ) , is affected by exposure to androgens , e.g. , testosterone while in the uterus
    and that this 2D :4 D ratio can be considered a crude measure for prenatal androgen
    exposure , with lower 2D :4 D ratios pointing to higher prenatal androgen exposure
    . The 2D :4 D ratio is calculated by dividing the length of the index finger of
    a given hand by the length of the ring finger of the same hand . A longer index
    finger will result in a ratio higher than 1 , while a longer ring finger will
    result in a ratio lower than 1 .   The 2D :4 D digit ratio is sexually dimorphic
    : although the second digit is typically shorter in both females and males , the
    difference between the lengths of the two digits is greater in males than in females
    .   A number of studies have shown a correlation between the 2D :4 D digit ratio
    and various physical and behavioral traits .'
  - Ahu A ʻ Umi Heiau means "shrine at the temple of ʻ Umi" in the Hawaiian Language.
---

# SentenceTransformer based on dunzhang/stella_en_1.5B_v5

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [dunzhang/stella_en_1.5B_v5](https://huggingface.co/dunzhang/stella_en_1.5B_v5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [dunzhang/stella_en_1.5B_v5](https://huggingface.co/dunzhang/stella_en_1.5B_v5) <!-- at revision 129dc50d3ca5f0f5ee0ce8944f65a8553c0f26e0 -->
- **Maximum Sequence Length:** 8096 tokens
- **Output Dimensionality:** 1024 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 8096, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1536, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Ahu A Umi Heiau',
    'Ahu A ʻ Umi Heiau means "shrine at the temple of ʻ Umi" in the Hawaiian Language.',
    'The digit ratio is the ratio of the lengths of different digits or fingers typically measured from the midpoint of bottom crease ( where the finger joins the hand ) to the tip of the finger . It has been suggested by some scientists that the ratio of two digits in particular , the 2nd ( index finger ) and 4th ( ring finger ) , is affected by exposure to androgens , e.g. , testosterone while in the uterus and that this 2D :4 D ratio can be considered a crude measure for prenatal androgen exposure , with lower 2D :4 D ratios pointing to higher prenatal androgen exposure . The 2D :4 D ratio is calculated by dividing the length of the index finger of a given hand by the length of the ring finger of the same hand . A longer index finger will result in a ratio higher than 1 , while a longer ring finger will result in a ratio lower than 1 .   The 2D :4 D digit ratio is sexually dimorphic : although the second digit is typically shorter in both females and males , the difference between the lengths of the two digits is greater in males than in females .   A number of studies have shown a correlation between the 2D :4 D digit ratio and various physical and behavioral traits .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

### Training Logs
| Epoch  | Step | Training Loss | retrival loss |
|:------:|:----:|:-------------:|:-------------:|
| 0.6466 | 500  | 0.0424        | 0.0060        |
| 1.2932 | 1000 | 0.0073        | 0.0040        |
| 1.9399 | 1500 | 0.0029        | 0.0039        |





<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->