SentenceTransformer based on thenlper/gte-small

This is a sentence-transformers model finetuned from thenlper/gte-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'celebr hitt correspond windmil doivent take june hove sequel petition hamlet crash mond knotti grudg sportsman prowl morrow semblanc jargon reap full ancestress cheruel manabozho merit buoy governor dine plain misstat grand dwelt fir kind joint around hound san moranget cricket confirm frosti balk straggl regret tenant invoc crop fervent tie uncharit savag omaha chassagoac conqueror infer repast crack répondu mèmoir splendor anywher match sept divan prey caus pratiqu theft dot disguis crime chaff incubus ouabouskiaou strike regardless disk croyant auec top droitur brulé 1701 much infuri morass misconceiv back rigg midnight atroci femm audess disput avail reluct tree shield andast peac solac utica set déchargent ouasi resté lock nativ kaskaskia negoti renounc confeder crude luth part horseback treacher orang réserv sit speedili mohegan enmiti pretens motionless giraff platt estr clap accliv proceed pervers access fish probabl ambassador faillon visag extend bow ottawa islinoi vexilla diver foment accuraci canton loutr bark level spring asthmat carolina term assent antonio considér jesuit bishop disprov daumont aver tangibao seneca amiti defect letter confluenc french dabbl threshold tomb inquiri travel proprieti bush espèc idl dreami document descend courag foray downward fring sandston incorrect parrot menez expressli displeasur eagl sépultur indec escarpé dens strip quiet mush eastern evinc natur pick honnêt coureur 83me eighti lichen toriman bell cachent confer stealthili spear waist catharin transfer merg ferland gratitud blue friabl paw forget prochain risk caution still generos awar burlesqu concentr mingl cinquièm pourtant altern us somebodi suppress unscrupul discord coat dog pierron loup campaign mangèrent cloth theme rope unnatur discipl haw battl superfici spendthrift empti tavern threat épuisé deliv deceas vicious employ trunk endow notwithstand jansenist baptism offend sustain complic almost larger commit villag invect green careen ownership request lightn braveri sunday remedi current',
    'Is the content related to romance genere',
    'Is the content related to romance genere',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,319 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 449 tokens
    • mean: 506.92 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 10.73 tokens
    • max: 12 tokens
  • Samples:
    anchor positive
    assum discredit loud immedi incumb wealthi speck flare sleepi marriag intang rise revolv stupor fool voic manner thereupon abhorr mountain general amend flew posi intox poet laid tel ugli issu insult armament assert croak illus deign discourag trust fund pray irregular aristocraci shoulder overcom dumb devil pas grass unnecessari heat event factotum shot stabl innumer fleshi later struggl vike arrog orchardward tune dissatisfact presum reclus seven behavior fine hebe hind ripen irrate brother annoy whitewash sunris curtain indulg delirium youth labori would unlucki unwrinkl initi hark bliss occas everyth folli subordin stamp glossi finish consist hall cave insight forg matter forward familiar hidden sandi noblest undevelop acr masonri wand took endeavor joke standpoint loveli picket caress nicknam coil temper unknown pledg sunk looker abil subterranean wari effemin go spit denounc recoveri violenc moorish gloomili wind stove religion senior stiffli shudder lean encount luckili pull weld approach liveli glyphi plagu funnel soulless inquir pearl tenabl unsaf justifi unhero curious subject laboratori societi afford dose hundredth thief tremor grizzl villan tumult knocker rainbow boy drama pitiless cynosur demeanor communic ironi lurk loftiest freshen offenc environ mixtur habitu blunt shirt straightway lieuten sofa lineament poison hypothet nonsens censor æon applaus blew blade sanguin caller heavenward resist readili tempor hatr rivalri purpl coward barber damask dialogu carpet seat disadvantag gad littl insignific rather apolog surpris frivol aloft uproari boot review ad thrown lavish trod curv join infirm wise undecid seclud protector humorist quiver peep repossess transit brewer warn swimmer reproduc failur upon rob draw wrist triumphant horror unusu leastway larg field rig durabl lord brink barrist show probe grow redund jacob sincer work twain sleev betroth anyon undo sadden darksom satin saint entreati central breez unconsid permit intellig gallon photograph whenc asid aristocrat taint ceil aloud Is the content related to non-fiction genere
    last highest gynê smoke proximum inclin synapteon gladden ekeinên flutter could ænian lead exact sleeper ascend faithless alik satisfi orcus merus nave frustra delphi muse balm realli regain arist convoy formid sell recal surest blast respect carnean mead envelop better dare moriar reduc talk glori mightest dicendum shrink abroad calm altisono sin ultima xxviii pous subterran kisso rage entha marri naught seldom upros race taphian restor elthont weather bewar forcibl lydian serm xenophon rest xeinôn rebuk spectr verum consilio satisfactorili medicin unfavor anthrôpoi prodess ætas 1437 lighten epebaiên across practic taken seer recommend dramont handsom tenor lepton hydatôn hêtis rose ill audiat mempto scalig propos suspicient falsehood long wetstein unintellig pluto enslav agit cross continu size lamb latebo ktypou cloudi like superstiti perchanc account colchian oaken euripidê delight infidel wed pitnonta excito mate liber discreet libya unpract whither gall murder weapon mean subsist cityless sepulchra nêpie eurota hyperechthairei antiop stop prosgelai earli achill metr suffici mellonta spot abiôton arbylê aveng catastroph kephalêi natal argous beyond sped known substant line parallel aeri given hew pavement euergesiôn egomet atmospher titan peal flatteri pheroean hygrotêt inclos givest tempt endear ôkeanou onta assonat payest realiti congeni sound pella unto advantag dynatai skimpôni apt expedi patro horrent illustri libri nautic beard stab seem situat lesser floweri success odyssey commemoratio unsulli palla lyei 1209 singular mellein unhonor languag surg regular eriosteptoi assertest gynaiko populac daphni scandal allianc stroke monk aught counter putter extinguish varianc elegi polydor pedest per fright bridegroom stadii unfortun skeptic horai solicitud publish offici kachla 1840 nation korytha corruptus kain topôn lament uncal olympus reveng cineri charon remittest length sipylus lolaus greatest unadorn shoot kalyptê nowher hospit blomfield promiscu iron shelter tipto stori unquest penthêrê Is the content related to non-fiction genere
    sank driven interrupt linen live sledg hast mistak alban cherish egg rhyme chief ezekiel whole excess neck shepherd robber snake even cours 160th neckti vocabulari wherefor vibrat protest repent stay import fanat pedestrian plenti convict threw thousand net timber crown owner echo poke battlement bugl nearer tole blush fresh darrel sail client warden happi colli strand congress eastward run limit scamp liberti celebr sacr squint treat outbreak dost offic hear bedroom brakeman correspond guilt glibli gabl son take jolli june mullen depot havin septemb leech guard bard extraordinari hamlet scarf tender juri knotti thurst unfad helpless strap hole rous slow shallow frequent morrow jargon befriend reap ocean spatter slaveri caesar isaiah forrest mile eliot full win pan wrong confront knee shear nice slid arrear angrili fourteen tentat merit governor bear togeth shook dine sermon fortitud web plain banker thrash sixteen grand grim forsooth railroad dwelt harrow burglar fir kind sober expector around hound joint hypocrit question clover skull snap bulli upper undu forehead sum cuff tramp cricket float speaker invis gestur mebb tax skeleton volcano drill tellin foreclosur editor confirm frosti scrambl regret ravel fiction hous holiday break schoolhous card pretenc crop fervent vittl tie mire whereon haughti fellow choos manag dinner infer crack dig index uneasi done drover foot agre studious verdict hand feat graven counterfeit brindl anywher fore thrill wolf partner heartbroken match martha prey caus imit muzzl public chalk beat welcom root celtic fifti person ladi excel confidenti jealousi damnabl xvii unutt sharpli crime sower train wrung manhood sunlight darken sharper secret grill elizabethan handwrit lay minut heav strike stalk horn amber near beg preacher loos christma discont rugos sleepless america tast consider top kidnap power buck much wreck ring merrier trick hard mischiev dagger mouth back knife prospect tear midnight cocoanut best pike abe gust dungeon poverti bond cassia gobbler exercis eben Is the content related to fiction genere
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,234 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 453 tokens
    • mean: 507.1 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 10.71 tokens
    • max: 12 tokens
  • Samples:
    anchor positive
    domest creed valentinian tone proclam peaceabl 1843 weakest incompet proscript realm esteem brigandag stock none incom authent competit follow labor vers wear ensembl impair student unalter glad cisalpin damocl sang perfidi pardon impera stupefi villa monopoli charl look link adag monomania messag hypocrisi priori counterpois publica gorgia redeem thank uncivil unwound fetter pascal serpit honorari maim superintend told homo promenad furnitur brief extract nehemiah furthermor competitor billion teas victim rate terminus higher mariti sacrileg behold bridg predecessor episcopi billow annot develop yardstick pretend special insinu kingship francai reckon sale devoid ghost difficulti driven falsifi pattern chief fatten contin retract dido repent thousand scholast ell librarian owner suffic fresh changer cartesian journeyman run treat offic ingenu war spontan bard extraordinari telescop extort assumpt gracious strategi frequent shallow aliquid manufactori ocean sibyl augustus mile galvan wrong usucapio knee beautifi wardenship bear togeth cart shook executor allobrog auger chapsal fortifi budget question implac entwin arbitrari float facto dearest logic commandit apprentic fiction advent traffic choos incred foot partner wolf evalu noel rioter muzzl root 1862 florenc manhood geometr nostra horn theseus beg overs melodrama inscript habent refrain helvetius disagre nodier similitud blanqui unemancip pike exercis obvious alli preambl wife ostens conquest compens coars cherbourg grantor invent duti epicur loss futil evapor gaul raison approb athenian insincer asham whim purpos unchang destruct imposit lacedaemonian wish conson pocket boobi commune relish ablest track cook blow friend geometri railway tiberius wash detriment meyer render teller ess amen arous idea personag sacrif repres stood david confrer fond sad cratch doubli attain advic vineyard pound habetur urgent britain communiti majorat juggleri biblic trim equal villein hazard expropri selfish declar taught ingratitud satisfact deliber wiser enthusiast Is the content related to non-fiction genere
    conscious chronolog leapt close sis drift lump station rank destitut contriv swivel grate stuck spare monoton thicket mesh yellow air fault choost reward scorn intent applic pestilenti contemptu greenhous mix pipe persuad plung avoid displac trustet ahoy concern critic sowsand name jounc downtown involuntari establish peril also settl flash voter mighti bang necess vial bewitch characterist adorn beauti sate decrepit citronell naturalist know conscienc fontenett laden strock deceiv inde pursuer xxii aimless moonlight archangel detain infatu frighten bought drows lucki pine trickl juic owfool pathet sunbeam tent needl gusti twas clung worthi diseas outrag recov made exhaust second begrudg cobweb privat corridor speak seventi bawn undress tarri remind enamour prompt lip graver ventur obedi basement forgotten crowd other sing incident breakwat excus wile rebound entangl philosophi flabbi deliver believ outang affect arriv vision soak bug realiz cruel frock promis pahdon everi modesti suzann fickl le african relief fortun laundri serenest ash straight damp awri lessen evil loudeh fonteett tardi spill hale hostess ladder avow medit seal longer well rebuff maintain quicker exclus donkey season hug wreath emphasi fill flag devious disturb bit tiger stolen intend drench unclean deep flourish apprehend admir veight flesh shiltren week anxieti violet how richard unbear everywher prefac conduct saunt stumbl though peopl sinc someth despair obey moor moral sill strang kine compassion mark doze flow dreamless wors crouch acquaint sugar typic doorsil leafi redempt unchalleng delug tarpaulin troop circumv hither reserv wander dirti crib cistern plead ruin serious slept scholar gradual drove fan mellow meet entertain till mantlepiec fairili sorri gasp southern heighten seed attempt joseph drown notion fascin constel rich consent speech teeth tire glorious pencil convuls glisten diffid lose citat dappl feast sooner belong splendid cigarett hoist sick midday tail fairst honor scorch savedt apathi color alvay inspect Is the content related to romance genere
    greatness late reput alarum compar fair rediscov realis round swell danc sayl crush particip move huddl materialis benumb prophesi infel unlaw rais suspit trumpet canibal herculean calcul yong specimen superiour forbear encreas fairest enlarg steer fama barrisor pediss preval shepheard umbra altum tergo catholiqu voluntatem assum bethink spar perplext oper immedi crab exploit wealthi catterpillar marriag labour rise revolv fool voic manner thereupon impo recevra montsureau abhorr enseam rendrer mountain general chariot amend flew poet laid tel issu insult radical assert bounti illus attyr discourag trust eundo penian mishap pray predict irregular unright lot expon shoulder overcom outright dumb treilli devil embrew enterd pas massing cognoscer essay splene exemplari grass traiter unnecessari prix scape heat event shot usd stabl highness syrtibus bel fleshi monsurri obsequi later struggl arrog intervent tune presum throne mess indur seven afflig judicial cyclop shakespearean fine sori prompter hind brother annoy inordin whosoev indulg lachesi perus youth span juli sadness 1888 would administr initi greediness obay hark bliss vellet occas folli subordin palladi stamp glossi finish consist hall cave lettr mercer forg 1865 gondomar matter forward niec stomack familiar noblest audaci thrid scap cure goos took crestfaln dispenc acheson zeal bodenstedt £300 temper pindus stephen barricado unknown elucid hostag desertful delighteth cruell 1681 clapdish wari go existen spit denounc violenc familia occisi wind religion eie lean oppidani encount bussii quellen desart cornhil pull approach haut plagu leas ornaverat epictetus nere pearl spenser sicil arrogantia justifi riot curious skirmish overlap subject faciebat societi afford celestial poultron villan humer jigg emrod 1903 boy drama dan communic 3830 offenc shaksper environ habitu blunt crafti down inviol men lieuten coit poison nonsens legitimaci scoff applaus blew letcher 1557 resist readili uncredit tempor hatr rivalri purpl coward librari errour sphære Is the content related to romance genere
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.3704 100 1.0978 0.9591
0.7407 200 1.089 1.0138
1.1111 300 1.0538 0.9570
1.4815 400 1.0502 0.9178
1.8519 500 1.0611 0.9197

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
13
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for deepapaikar/gte-small-finetuned

Base model

thenlper/gte-small
Finetuned
(7)
this model