--- base_model: BAAI/bge-small-en-v1.5 datasets: [] language: [] library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@5 - cosine_ndcg@10 - cosine_ndcg@100 - cosine_mrr@5 - cosine_mrr@10 - cosine_mrr@100 - cosine_map@100 - dot_accuracy@1 - dot_accuracy@5 - dot_accuracy@10 - dot_precision@1 - dot_precision@5 - dot_precision@10 - dot_recall@1 - dot_recall@5 - dot_recall@10 - dot_ndcg@5 - dot_ndcg@10 - dot_ndcg@100 - dot_mrr@5 - dot_mrr@10 - dot_mrr@100 - dot_map@100 pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:7033 - loss:GISTEmbedLoss widget: - source_sentence: How will the performance of CBBOs be assessed in the third and fourth year? sentences: - ''' (iv) In third and fourth year, performance of the CBBOs will be assessed based on - (a) issuing Share Certificates to each member in third year, if any; (b) audited Financial Statements for FPOs for second year and third year in due time and filing as required; (c) MoU and vendor registration as per Business Plan with Marketing Agencies/Institutional Buyers; (d) trading/uploading of produce in e-NAM/other sources, if any; (e) second tranche equity grant to FPOs, if any; and (f) second tranche of credit guarantee facility, if any . (v) In the fifth year, performance of the CBBOs will be assessed based on (a) audited Statements of accounts of FPO and filing it; (b) 100% of agri-business plan executed and value chain developed; (c) revenue model showing financial growth in last 3 consecutive years; (d) detailed project completion Report; and (e) third tranche of credit guarantee facility if any.''' - '''5. Tussock caterpillar, Notolopus (=Orygyia) postica , Lymantriidae, Lepidoptera Symptom of damage: Defoliation. Nature of damage: Caterpillars of the moth feed on the leaves. Egg: Eggs are laid in clusters on the leaves and covered over with hairs. Larva: Caterpillars are gregarious in young stages. Full grown larva possess a brown head, a pair of long pencil of hairs projecting forwardly from the prothorax, yellowish tuft of hairs arising from the lateral side of the first two abdominal segment and long brownish hairs arising from 8 th abdominal segment. Pupa: Pupation takes place in silken cocoon. Adult: Small adult with yellowish brown wings. Female moth is wingless. Presence of bipectinate antenna.''' - '''The Kisan Credit Card (KCC) scheme was introduced in 1998 for issue of Kisan Credit Cards to farmers on the basis of their holdings for uniform adoption by the banks so that farmers may use them to readily purchase agriculture inputs such as seeds, fertilizers, pesticides etc. and draw cash for their production needs. The scheme was further extended for the investment credit requirement of farmers viz. allied and non-farm activities in the year 2004. The scheme was further revisited in 2012 by a working Group under the Chairmanship of Shri T. M. Bhasin, CMD, Indian Bank with a view to simplify the scheme and facilitate issue of Electronic Kisan Credit Cards. The scheme provides broad guidelines to banks for operationalizing the KCC scheme. Implementing banks will have the discretion to adopt the same to suit institution/location specific requirements.''' - source_sentence: How should State Government disclose ceiling premium rate for a crop in the tender document? sentences: - '''However, in absence of insured area of last year/season for all proposed crops or any crop, net sown area of that crop(s) will be considered for calculation of weighted premium of district. This data will be used for calculation of L1 only. 7.1.5 Bidding **shall be done through e-tendering** and work order may be released within 2 weeks of the opening of the Tender. 7.1.6 Depending on the risk profile, historical loss cost and cost benefit analysis for the proposed crop(s) in district(s) of any cluster, if the State Government feels that the premium rate likely to be offered by bidding Insurance Companies would be abnormally high, then the State Govt. can fix a ceiling on premium rates for such crop(s) proposed to be included in the bidding evaluation for the bidding period. However, recourse to this ceiling provision may be done only in well justified cases and not as a general practice. The ceiling premium rate may be derived based on statistical evaluation/actuarial premium analysis, loss cost, historical payout etc and name of such crop should be disclosed by State Govt. compulsorily in the tender document. 7.1.7 In such cases where a ceiling has been indicated, State government must call financial bids in two step bidding or in two separate envelopes. First bid/envelop is for disclosing the premium rate offered by each participating Insurance Company for such ceiling crops and must be categorised under \''Ceiling Premium Rate\'' and 2nd bid envelop is for bidding of crop wise premium rate for all crops included in tender. Time interval for opening of both bid/envelop should be compulsorily mentioned in the bidding documents and should preferably be on the same day. All participating Insurance Companies have to submit the bid offer as per the procedure mentioned above. 7.1.8 State Govt.''' - '''| Chapters | Particulars | Page No. |\n|---------------|------------------------------------------------------------|-------------|\n| 1 | Concept of Producer Organisation | 1 |\n| 2 | Producer Organisation Registered as Cooperative Society | 15 |\n| 3 | Producer Organisation Registered as Producer Company | 19 |\n| 4 | Producer Organisation Registered as Non-Profit Society | 33 |\n| 5 | Producer Organisation Registered as Trust | 36 |\n| 6 | Producer Organisation Registered as Section 8 Company | 39 |\n| 7 | Business Planning | 42 |\n| 8 | Financial Management | 55 |\n| 9 | Funding Arrangement | 60 |\n| 10 | Monitoring by the PO, POPI and Funding Agencies | 80 |\n| Attachment | | |\n| 1 | Producer Company Act provisions | |\n| 2 | PRODUCE Fund Operational Guidelines | 106 |\n| 3 | SFAC Circular on Promoting / supporting Producer Companies | 114 |\n| 4 | Case Study on Bilaspur Model of PO | 125 |\n| 5 | Indicative Framework of the process of forming a PO | 131 |\n| 6 | References | 138 |\n| 7 | Memorandum of Agreement between NABARD and POPI | 139 |\n| 8 | Memorandum of Understanding between NABARD and RSA | 143 |\n| 9 | | |\n| Abbreviations | | |\n| | | |\n| 146 | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |\n| | | |''' - '''Agro-industries generate residues like husk, hull, shell, peel, testa, skin, fibre, bran, linter, stone, seed, cob, prawn, head, frog legs, low grade fish, leather waste, hair, bones, coir dust, saw dust, bamboo dust, etc. which could be recycled or used efficiently through agro-processing centres. In the last three decades, rice and sugarcane residues have increased by 162 and 172 %, respectively. Their disposal problem needs serious rethinking (Vimal, 1981). To some extent these organic residues are used as soil conditioner, animal feed, fuel, thatching and packing materials. These can also be put to new uses for manufacture of various chemicals and specific products (like silica, alcohol, tannins, glue, gelatine, wax, etc), feed, pharmaceuticals (Iycogenin, antibiotics, vitamins, etc.), fertilizers, energy, construction materials, paper pulp, handicraft materials etc. Residues from fruit and vegetable industries, fish and marine industries and slaughter o straw decrease their efficiency without pretreatment.''' - source_sentence: What is the purpose of using pectolytic enzymes in fruit juice processing? sentences: - '''Aggregating producers into collectives is one of the best mechanism to improve access of small producers to investment, technology and market. The facilitating agency should however keep the following factors in view: a. Types of small scale producers in the target area, volume of production, socioeconomic status, marketing arrangement b. Sufficient demand in the existing market to absorb the additional production without significantly affecting the prices c. Willingness of producers to invest and adopt new technology, if identified, to increase productivity or quality of produce d. Challenges in the market chain and market environment e. Vulnerability of the market to shocks, trends and seasonality f. Previous experience of collective action (of any kind) in the community g. Key commodities, processed products or semi-finished goods demanded by major retailers or processing companies in the surrounding areas/districts h. Support from Government Departments, NGOs, specialist support agencies and private companies for enterprise development i. Incentives for members (also disincentives) for joining the PO Keeping in view the sustainability of a Producer Organisation, a flow chart of activities along with timeline, verifiable indicators and risk factors is provided at Attachment-5.''' - '''2. Sampling method to be adopted – Random Size of the card including area for label and other details = 20 x 30 cmm = 600 cm 2 No. of Grids = 30 Area of each grid = 7 x 2 cm = 14 cm 2 Total No. of eggs / cm 2 to be accommodated = 96,000 – 1,08,000 Mean number of egg / cm 2 of the card in the grid area excluding area for labeling = 200 – 250 Number of counts/ card of size 20 x 30 cm to be taken No. of parasitised eggs = 12 • 3-4 days old parasitised egg card has to be selected for examination • count the number of eggs and eggs parasitised in an area by 1 cm 2 • Per card of size 20 x 30 cm count randomly in 12 positions • Repeat the process for three different cards of same age • Express the per cent parasitisation . The result should fall in range of 85-90 per cent.''' - '''Pectins are colloidal in nature, making solutions viscous and holding other materials in suspension. Pectinesterase removes methyl groups from the pectin molecules exposing carboxyl groups which in the presence of bi- or multivalent cations, such as calcium, form insoluble salts which can readily be removed. At the same time, polygalacturonase degrades macromolecular pectin, causing reduction in viscosity and destroying the protective colloidal action so that suspended materials will settle out. Extensive use of pectolytic enzymes is made in processing fruit juices. Addition of pectic enzymes to grapes or other fruits during crushing or grinding results in increased yields of juice on pressing. Wine from grapes so treated will usually clear faster when fermentation is complete, and have better color.''' - source_sentence: What is the purpose of the PM-Kisan Portal? sentences: - ''' 2) In case of cultivable land in the State of Nagaland which is categorised as Jhum land as per definition under Section–2(7) of the Nagaland Jhum Land Act, 1970 and which is owned by the community/clan/village council/village chieftan, the identification of beneficiaries under PM-Kisan scheme, shall be on the basis of certification of land holding by the village council/chief/head of the village, duly verified by the administrative head of the circle/sub division and countersigned by the Deputy Commissioner of the District. Provided that the name of the beneficiary is included in the state of Nagaland''s Agriculture Census of 2015-16. This proviso shall not be applicable in cases of succession and family partition. The list of such beneficiaries shall be subject to the exclusions under the operational guidelines. 5.6 For identification of *bona fide* beneficiary under PM-Kisan Scheme in Jharkhand, the following proposal of Government of Jharkhand was considered and approved by the Committee: \''The farmer will be asked to submit ''Vanshavali (Lineage)'' linked to the entry of land record comprising his \\ her ancestor''s name giving a chart of successor. This lineage chart shall be submitted before the Gram Sabha for calling objections. After approval of the Gram Sabha, the village level \\ circle level revenue officials will verify and authenticate the Vanshawali and possession of holding. This authenticated list of farmers after due verification of succession chart shall be countersigned by the District level revenue authority. Farmers'' names, subject to the exclusion criterion after following the aforementioned process, shall be uploaded on the PM-Kisan portal along with other required details for this disbursement of benefit under the scheme.\''''' - '''Deep summer ploughing should be done for field preparation for pulses,apply FYM and compost @ 8-10 t/ha and mix well. Sowing of Pigeon pea should be done by the end of June in rows at the spacing of 60-90x15-20 cm. Seed rate should be 12-15 kg/ha Seed should be treated with Carbendazim or Thirum @3g/kg seed Fertilizer dose should be scheduled as per the soil test results. In general, 20-25 kg N, 45-50 kg P and 15-20 kg K and 20 kg S should be given basal. Improved varieties like Chhattisgarh Arhar -1, Chhattisgarh-2, Rajivlochan and TJT-501 should be sown. Soybean and other pulse crops should be sown with proper drainage arrangement. For this seed should be treated with culture before sowing. The quantity of Rhizobium culture@5g + PSB @ 10 g/kg seed should be used for this seed treatment.''' - '''Union Territory. The details of farmers are being maintained by the States / UTs either in electronic form or in manual register. To make integrated platform available in the country to assist in benefit transfer, a platform named **PM-Kisan Portal** available at URL (**http://pmkisan.gov.in**) has been be launched for uploading the farmers'' details at a single web-portal in a uniform structure. 9.2 The PM-Kisan Portal has been created with the following objectives - i) To provide verified and single source of truth on farmers'' details at the portal. ii) Timely assistance to the farmers in farm operation iii) A unified e-platform for transferring of cash benefits into farmer''s bank account through Public Financial Management System (PFMS) integration. iv) Location wise availability of benefited farmers'' list. v) Ease of monitoring across the country on fund transaction details.''' - source_sentence: What should be done before sowing pigeonpea in fields where it is being sown for the first time after a long time? sentences: - '''The sole arbitrator shall be appointed by NABARD in case of dispute raised by NABARD, from the panel of three persons nominated by RSA. Similarly, the sole arbitrator shall be appointed by RSA if dispute is raised by RSA from the panel of three persons nominated by NABARD. The language of the Arbitration shall be English and the arbitrator shall be fluent in English. The arbitrator should be person of repute and integrity and place of arbitration shall be Mumbai.\'' 9. NABARD shall have the right to enter into similar MoU/agreements with any other RSA/Institution. 10. Any notice required to be given under this MoU/Agreement shall be served on the party at their respective address given below by hand delivery or by registered post :''' - '''y Firstly, Treat 1kg seeds with a mixture of 2 grams of thiram and one gram of carbendazim or 4 grams of Trichoderma + 1 gram of carboxyne or carbendazim. Before planting, treat each seed with a unique Rhizobium culture of pigeon pea. A packet of this culture has to be sprinkled over 10 kg of seeds, then mix it lightly with hands, so that a light layer is formed on the seeds. Sow this seed immediately. There is a possibility of the death of culture organisms from strong sunlight. In fields where pigeonpea is being sown for the first time after a long time, it must use culture.''' - '''Organic farming is one of the several approaches found to meet the objectives of sustainable agriculture. Organic farming is often associated directly with, \''Sustainable farming.\'' However, ‘organic farming’ and ‘sustainable farming’, policy and ethics-wise are t wo different terms. Many techniques used in organic farming like inter-cropping, mulching and integration of crops and livestock are not alien to various agriculture systems including the traditional agriculture practiced in old countries like India. However, organic farming is based on various laws and certification programmes, which prohibit the use of almost all synthetic inputs, and health of the soil is recognized as the central theme of the method. Organic products are grown under a system of agriculture without the use of chemical fertilizers and pesticides with an environmentally and socially responsible approach. This is a method of farming that works at''' model-index: - name: SentenceTransformer based on BAAI/bge-small-en-v1.5 results: - task: type: information-retrieval name: Information Retrieval dataset: name: val evaluator type: val_evaluator metrics: - type: cosine_accuracy@1 value: 0.4680306905370844 name: Cosine Accuracy@1 - type: cosine_accuracy@5 value: 0.9092071611253197 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9603580562659847 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.4680306905370844 name: Cosine Precision@1 - type: cosine_precision@5 value: 0.18184143222506394 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09603580562659846 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.4680306905370844 name: Cosine Recall@1 - type: cosine_recall@5 value: 0.9092071611253197 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9603580562659847 name: Cosine Recall@10 - type: cosine_ndcg@5 value: 0.7079399335444153 name: Cosine Ndcg@5 - type: cosine_ndcg@10 value: 0.724527850349024 name: Cosine Ndcg@10 - type: cosine_ndcg@100 value: 0.732682390595948 name: Cosine Ndcg@100 - type: cosine_mrr@5 value: 0.6404518329070746 name: Cosine Mrr@5 - type: cosine_mrr@10 value: 0.6473191450493229 name: Cosine Mrr@10 - type: cosine_mrr@100 value: 0.649235332852707 name: Cosine Mrr@100 - type: cosine_map@100 value: 0.6492353328527082 name: Cosine Map@100 - type: dot_accuracy@1 value: 0.46675191815856776 name: Dot Accuracy@1 - type: dot_accuracy@5 value: 0.9092071611253197 name: Dot Accuracy@5 - type: dot_accuracy@10 value: 0.9603580562659847 name: Dot Accuracy@10 - type: dot_precision@1 value: 0.46675191815856776 name: Dot Precision@1 - type: dot_precision@5 value: 0.18184143222506394 name: Dot Precision@5 - type: dot_precision@10 value: 0.09603580562659846 name: Dot Precision@10 - type: dot_recall@1 value: 0.46675191815856776 name: Dot Recall@1 - type: dot_recall@5 value: 0.9092071611253197 name: Dot Recall@5 - type: dot_recall@10 value: 0.9603580562659847 name: Dot Recall@10 - type: dot_ndcg@5 value: 0.7074679767075504 name: Dot Ndcg@5 - type: dot_ndcg@10 value: 0.7240558935121589 name: Dot Ndcg@10 - type: dot_ndcg@100 value: 0.7322104337590828 name: Dot Ndcg@100 - type: dot_mrr@5 value: 0.6398124467178163 name: Dot Mrr@5 - type: dot_mrr@10 value: 0.6466797588600646 name: Dot Mrr@10 - type: dot_mrr@100 value: 0.6485959466634487 name: Dot Mrr@100 - type: dot_map@100 value: 0.6485959466634499 name: Dot Map@100 --- # SentenceTransformer based on BAAI/bge-small-en-v1.5 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 384 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("SamagraDataGov/embedding_finetuned") # Run inference sentences = [ 'What should be done before sowing pigeonpea in fields where it is being sown for the first time after a long time?', "'y Firstly, Treat 1kg seeds with a mixture of 2 grams of thiram and one gram of carbendazim or 4 grams of Trichoderma + 1 gram of carboxyne or carbendazim. Before planting, treat each seed with a unique Rhizobium culture of pigeon pea. A packet of this culture has to be sprinkled over 10 kg of seeds, then mix it lightly with hands, so that a light layer is formed on the seeds. Sow this seed immediately. There is a possibility of the death of culture organisms from strong sunlight. In fields where pigeonpea is being sown for the first time after a long time, it must use culture.'", "'Organic farming is one of the several approaches found to meet the objectives of sustainable agriculture. Organic farming is often associated directly with, \\'Sustainable farming.\\' However, ‘organic farming’ and ‘sustainable farming’, policy and ethics-wise are t wo different terms. Many techniques used in organic farming like inter-cropping, mulching and integration of crops and livestock are not alien to various agriculture systems including the traditional agriculture practiced in old countries like India. However, organic farming is based on various laws and certification programmes, which prohibit the use of almost all synthetic inputs, and health of the soil is recognized as the central theme of the method. Organic products are grown under a system of agriculture without the use of chemical fertilizers and pesticides with an environmentally and socially responsible approach. This is a method of farming that works at'", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `val_evaluator` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.468 | | cosine_accuracy@5 | 0.9092 | | cosine_accuracy@10 | 0.9604 | | cosine_precision@1 | 0.468 | | cosine_precision@5 | 0.1818 | | cosine_precision@10 | 0.096 | | cosine_recall@1 | 0.468 | | cosine_recall@5 | 0.9092 | | cosine_recall@10 | 0.9604 | | cosine_ndcg@5 | 0.7079 | | cosine_ndcg@10 | 0.7245 | | cosine_ndcg@100 | 0.7327 | | cosine_mrr@5 | 0.6405 | | cosine_mrr@10 | 0.6473 | | cosine_mrr@100 | 0.6492 | | cosine_map@100 | 0.6492 | | dot_accuracy@1 | 0.4668 | | dot_accuracy@5 | 0.9092 | | dot_accuracy@10 | 0.9604 | | dot_precision@1 | 0.4668 | | dot_precision@5 | 0.1818 | | dot_precision@10 | 0.096 | | dot_recall@1 | 0.4668 | | dot_recall@5 | 0.9092 | | dot_recall@10 | 0.9604 | | dot_ndcg@5 | 0.7075 | | dot_ndcg@10 | 0.7241 | | dot_ndcg@100 | 0.7322 | | dot_mrr@5 | 0.6398 | | dot_mrr@10 | 0.6467 | | dot_mrr@100 | 0.6486 | | **dot_map@100** | **0.6486** | ## Training Details ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `gradient_accumulation_steps`: 4 - `learning_rate`: 1e-05 - `weight_decay`: 0.01 - `num_train_epochs`: 1.0 - `warmup_ratio`: 0.1 - `load_best_model_at_end`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 8 - `per_device_eval_batch_size`: 8 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 4 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 1e-05 - `weight_decay`: 0.01 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1.0 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `eval_use_gather_object`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | val_evaluator_dot_map@100 | |:----------:|:-------:|:-------------:|:---------:|:-------------------------:| | 0.0682 | 15 | 0.6463 | 0.3498 | 0.6152 | | 0.1364 | 30 | 0.3071 | 0.1975 | 0.6212 | | 0.2045 | 45 | 0.2023 | 0.1576 | 0.6248 | | 0.2727 | 60 | 0.1457 | 0.1357 | 0.6321 | | 0.3409 | 75 | 0.2456 | 0.1228 | 0.6370 | | 0.4091 | 90 | 0.1407 | 0.1130 | 0.6365 | | 0.4773 | 105 | 0.1727 | 0.1042 | 0.6393 | | 0.5455 | 120 | 0.1311 | 0.0975 | 0.6428 | | 0.6136 | 135 | 0.13 | 0.0910 | 0.6433 | | 0.6818 | 150 | 0.0919 | 0.0872 | 0.6466 | | 0.75 | 165 | 0.1587 | 0.0851 | 0.6490 | | 0.8182 | 180 | 0.1098 | 0.0834 | 0.6481 | | 0.8864 | 195 | 0.1013 | 0.0824 | 0.6461 | | **0.9545** | **210** | **0.1144** | **0.082** | **0.6486** | | 1.0 | 220 | - | 0.0820 | 0.6486 | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.1 - Transformers: 4.43.4 - PyTorch: 2.4.1+cu121 - Accelerate: 0.33.0 - Datasets: 2.21.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### GISTEmbedLoss ```bibtex @misc{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, author={Aivin V. Solatorio}, year={2024}, eprint={2402.16829}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```