UniHGKR-base

Our paper: UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers.

Please see github repository UniHGKR to know how to use this model.

We recommend using the sentence-transformers package to load our model and to perform embedding for paragraphs and sentences.

It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Training Details

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.0.0+cu118
  • Accelerate: 0.34.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

If you find this resource useful in your research, please consider giving a like and citation.

@article{min2024unihgkr,
  title={UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers},
  author={Min, Dehai and Xu, Zhiyang and Qi, Guilin and Huang, Lifu and You, Chenyu},
  journal={arXiv preprint arXiv:2410.20163},
  year={2024}
}
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.