MappingAdapter exact structure available in representation_mapping.py

Mapping "sentence-transformers/stsb-roberta-large"'s hidden representation to "mistralai/Mistral-7B-Instruct-v0.1"'s.

Training:

  • Steps: 114k

  • Gradient accumulation: 2

  • Batch size: 64

  • Warm-up steps: 100

  • Learning Rate: 3e-5 with linear scheduling

  • Eval steps: %8000

  • Training hours: ~98h

  • Eval hours: ~10h

  • Gradient updates: 57k

  • Train examples: 7.3M

  • Eval examples: 106k

  • Adapter: Decoder_dim (4096) โ†’ 4096 โ†’ LeakyRelu(.1) โ†’ Encoder_dim (1024)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train sade-adrien/mappingadapter_roberta_mistral