CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric NanoMSMARCO NanoNFCorpus NanoNQ
map 0.5146 (+0.0251) 0.3315 (+0.0705) 0.5458 (+0.1262)
mrr@10 0.5034 (+0.0259) 0.5731 (+0.0733) 0.5486 (+0.1219)
ndcg@10 0.5741 (+0.0337) 0.3424 (+0.0173) 0.6165 (+0.1158)

Cross Encoder Nano BEIR

Metric Value
map 0.4640 (+0.0739)
mrr@10 0.5417 (+0.0737)
ndcg@10 0.5110 (+0.0556)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 78,704 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 33.17 characters
    • max: 95 characters
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
  • Samples:
    query docs labels
    what is the weather like in the desert climate africa ["Great parts of North Africa and Southern Africa as well as the whole Horn of Africa mainly have a hot desert climate, or a hot semi-arid climate for the wetter locations. The Sahara Desert at the north is the largest hot desert in the world and is one of the hottest, driest and sunniest places on Earth. Only the northernmost and the southernmost fringes of the continent have a Mediterranean climate because they aren't located under the tropics. Because of this geographical situation, Africa is a hot continent as the solar radiation intensity is always high.", "Climate zones of Africa, showing the ecological break between the hot desert climate of the Sahara Desert (red), the hot semi-arid climate of the Sahel (orange) and the tropical climate of Central and Western Africa (blue). Only the northernmost and the southernmost fringes of the continent have a Mediterranean climate because they aren't located under the tropics. Because of this geographical situation, Africa is a hot contine... [1, 0, 0, 0, 0, ...]
    lifting weights for definition ['1. (Weightlifting) the sport of lifting barbells of specified weights in a prescribed manner for competition or exercise.', 'weightlifting. n. 1. (Weightlifting) the sport of lifting barbells of specified weights in a prescribed manner for competition or exercise. (ˈweɪtˌlɪf tɪŋ). n.', 'Full Definition of WEIGHT TRAINING. : a system of conditioning involving lifting weights especially for strength and endurance. See weight training defined for English-language learners.', 'Definition: A repetition maximum (RM) is the the most weight you can lift for a defined number of exercise movements. A 1 RM, for example, is the heaviest weight you can lift if you give it your maximum effort. A 1RM is your personal weightlifting record for any particular exercise. It could be a squat or deadlift or any other.', 'But, as you can see, reps and intensity go hand in hand most of the time. Meaning…. 1 The more reps you can lift a weight for = the lower your training intensity is. 2 The fewer reps you can lift a weight for = the high'] [1, 0, 0, 0, 0]
    average temperatures at crater lake oregon ['Winter. Winter temperatures (January – March) average 19 ºF (-7 ºC) at night and 36 ºF (2 ºC) during the day. Crater Lake stands at an elevation of 7,100 feet. The average snowfall in winter is 533 inches. Snow pack on the ground ranges from 3 feet to 10 feet. Winter snow melt is unpredictable, but generally occurs by the first part of July. Summer temperatures (July – September) average 40 ºF (4 ºC) at night and 70 ºF (21 ºC) during the day.', 'wind the average daily wind speed in october has been around 5 km h that s the equivalent to about 3 mph or 3 knots in recent years the maximum sustained wind speed has reached 46 km h that s the equivalent of around 29 mph or 25 knots throughout the month of october daytime temperatures will generally reach highs of around 22 c that s about 72 f at night the average minimum temperature drops down to around 5 c that s 40 f in recent times the highest recorded temperature in october has been 36 c that s 96 f with the lowest recorded temperatur... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity"
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 1,000 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 33.89 characters
    • max: 91 characters
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
  • Samples:
    query docs labels
    what is the function of a ciliated cell ['Description. In this clip the structure and function of a ciliated epithelial cell is described. Cilia are tiny hair like structures on the surface of the cell. The hairs sweep hair, mucus, trapped dust and bacteria up to the back of the throat where it can be swallowed.', 'Description. In this clip the structure and function of a ciliated epithelial cell is described. Cilia are tiny hair like structures on the surface of the cell. The hairs sweep hair, mucus, trapped dust and bacteria up to the back of the throat where it can be swallowed.', 'Top 10 facts about the world. Ciliated cells are cells that are covered in tiny hair-like projections known as cilia. There tend to be two main types of this cell, namely motile and non-motile, sometimes also known as “primary.” In most cases this distinction has to do with how the cell uses its cilia.', 'A ciliated epithelial cell is a cell that you have inside your body mainly your throat and it has tiny little hairs that act like a brush. Th... [1, 0, 0, 0, 0, ...]
    felicity name meaning ["A virtue name. Latin Meaning: The name Felicity is a Latin baby name. In Latin the meaning of the name Felicity is: From 'felicitas', meaning happiness or good luck. A popular 16th century Puritan virtue name. Famous bearer: British actress Felicity Kendal.", 'From the English word felicity meaning happiness, which ultimately derives from Latin felicitas good luck. This was one of the virtue names adopted by the Puritans around the 17th century. It can sometimes be used as an English form of the Latin name FELICITAS (1) .', 'Meaning of Felicity. English name. In English, the name Felicity means-Happy. A virtue name.. Other origins for the name Felicity include-English, Latin-American, French.The name Felicity is most often used as a girl name or female name. English Name Meaning-Happy.', "Felicity /fe-lic-i-ty/ [4 sylls.] as a girls' name is pronounced fa-LISS-a-tee. It is of Old French and Latin origin, and the meaning of Felicity is lucky. From felicitas (see Felix). A virtue name first used in the 17th century.", 'Meaning & History. From the English word felicity meaning happiness, which ultimately derives from Latin felicitas good luck. This was one of the virtue names adopted by the Puritans around the 17th century. It can sometimes be used as an English form of the Latin name FELICITAS (1) .'] [1, 0, 0, 0, 0]
    what is the purpose of a meteorologist ['Meteorology is the interdisciplinary scientific study of the atmosphere. Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the 18th century. The 19th century saw modest progress in the field after observing networks formed across several countries. Weather forecasting is the application of science and technology to predict the state of the atmosphere for a future time and a given location. Human beings have attempted to predict the weather informally for millennia, and formally since at least the 19th century.', "A meteorologist is probably best known for weather forecasting, but the weather reports that come through media such as radio and televisions broadcasts or the newspaper and the Internet are only a fraction of what this professional actually does. Although a lot of anchors have a communications degree, a meteorologist is the only person that absolutely needs a degree to do what they do. That being said, I'm not sure t... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0533 (-0.4871) 0.2906 (-0.0345) 0.0463 (-0.4543) 0.1301 (-0.3253)
0.0004 1 2.0984 - - - - -
0.0508 125 2.0867 - - - - -
0.1016 250 2.0834 2.0711 0.4487 (-0.0918) 0.3039 (-0.0211) 0.4812 (-0.0195) 0.4113 (-0.0441)
0.1524 375 2.0729 - - - - -
0.2033 500 2.077 2.0686 0.5290 (-0.0115) 0.3097 (-0.0154) 0.6165 (+0.1159) 0.4851 (+0.0297)
0.2541 625 2.0742 - - - - -
0.3049 750 2.0722 2.0678 0.5565 (+0.0161) 0.3207 (-0.0044) 0.6081 (+0.1074) 0.4951 (+0.0397)
0.3557 875 2.0662 - - - - -
0.4065 1000 2.0756 2.0668 0.5696 (+0.0291) 0.3294 (+0.0043) 0.6108 (+0.1101) 0.5032 (+0.0479)
0.4573 1125 2.0746 - - - - -
0.5081 1250 2.0728 2.0669 0.5659 (+0.0255) 0.3130 (-0.0121) 0.6227 (+0.1220) 0.5005 (+0.0452)
0.5589 1375 2.0698 - - - - -
0.6098 1500 2.0728 2.0657 0.5421 (+0.0017) 0.3471 (+0.0220) 0.5962 (+0.0955) 0.4951 (+0.0397)
0.6606 1625 2.0808 - - - - -
0.7114 1750 2.0739 2.0657 0.5691 (+0.0287) 0.3334 (+0.0084) 0.6090 (+0.1083) 0.5038 (+0.0485)
0.7622 1875 2.0785 - - - - -
0.8130 2000 2.0709 2.0654 0.5635 (+0.0231) 0.3429 (+0.0179) 0.6236 (+0.1229) 0.5100 (+0.0546)
0.8638 2125 2.0756 - - - - -
0.9146 2250 2.076 2.0653 0.5741 (+0.0337) 0.3424 (+0.0173) 0.6165 (+0.1158) 0.5110 (+0.0556)
0.9654 2375 2.0751 - - - - -
-1 -1 - - 0.5741 (+0.0337) 0.3424 (+0.0173) 0.6165 (+0.1158) 0.5110 (+0.0556)
  • The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.247 kWh
  • Carbon Emitted: 0.096 kg of CO2
  • Hours Used: 0.827 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.48.3
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to rank: from pairwise approach to listwise approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
34
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet

Finetuned
(46)
this model

Dataset used to train tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet