5DAI/APEX-Embedding-7B-v0.1

Model Card: APEX-Embedding-7B [NON-COMMERCIAL USE ONLY]

Fifth Dimension

Read our paper on ArXiv here: https://arxiv.org/abs/2410.18105

Model Overview

APEX-Embedding-7B is a 7-billion parameter model optimized for Factual Document Retrieval in Retrieval-Augmented Generation (RAG) systems. During training, the model was enhanced using Structured Entity Relationship Maps and Model-Aware Contrastive Sampling to focus on factual accuracy. The final model is highly effective for generating precise text embeddings, especially for industries that rely on large-scale document retrieval tasks such as legal, compliance, and real estate.

This model achieves a 90.86% in rank@1 accuracy during our document retrieval evaluation compared to similar models, ensuring reliable and accurate retrieval of relevant documents from large datasets.

License & Disclaimer

License: Creative Commons Attribution Non Commercial 4.0 License

This release is for non-commercial research purposes only, and has been published in support of an academic paper. The model has been fine-tuned for real-world Factual RAG tasks and has not been designed or evaluated for all embedding tasks, such as those found in the MTEB or other benchmark datasets. This model is provided 'as is' without any warranties and the authors are not to be held responsible for any consequences resulting from its use. Users are responsible for ensuring both their legal compliance with the license terms, local laws, and the model's technical suitability for their specific use case.

How to Use

The following steps guide you on how to generate embeddings and compare the similarity between queries and documents.

Environment Setup

First, install the necessary libraries:

pip install torch transformers peft accelerate numpy
pip install -U bitsandbytes

Loading the Model

Here is the code to load the model: (There is a pre-quantised version of the base model available in the base directory)

from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

model_path = "5DAI/APEX-Embedding-7B-v0.1"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModel.from_pretrained(
    model_path,
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.to(device)

Generating Embeddings for Queries and Documents

When generating embeddings, use the following instruction prompt for both queries and documents:

def get_embedding(text: str, model, tokenizer):
    prompt = f"Instruction: Please perform a RAG search based on the following. Text: {text}"
    inputs = tokenizer(prompt, return_tensors="pt", max_length=8192, padding=True, truncation=True).to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    embedding = outputs.last_hidden_state[:, -1, :]  # Use last token pooling
    embedding = torch.nn.functional.normalize(embedding, p=2, dim=1)
    return embedding.cpu().numpy()

Context Window extends to 32K tokens, but best results can be achieved if text is chunked by page (< 8192 tokens).

Cosine Similarity for Queries and Documents

To compare a query and a document, use the cosine similarity function:

import numpy as np

def cosine_similarity(vecA: np.ndarray, vecB: np.ndarray) -> float:
    normA = np.linalg.norm(vecA)
    normB = np.linalg.norm(vecB)
    return np.dot(vecA, vecB) / (normA * normB) if normA > 0 and normB > 0 else 0

Example: Query vs. Document Embedding

Here’s how to generate embeddings for a query and a document, and compare their similarity:

query = "What are the legal requirements for property zoning in urban areas?"
document = "This document contains details about urban property zoning laws, including legal frameworks and compliance standards."

query_embedding = get_embedding(query, model, tokenizer)[0]
document_embedding = get_embedding(document, model, tokenizer)[0]

similarity = cosine_similarity(query_embedding, document_embedding)
print(f"Cosine similarity between query and document: {similarity}")

Citation

Please cite APEX-Embedding-7B as follows:

@misc{APEX-embedding-7b,
  title={APEX-Embedding-7B: Improving Embedding Accuracy for Document Retrieval Using Entity Relationship Maps and Model-Aware Contrastive Sampling},
  author={Thea Aviss},
  year={2024},
  url={https://arxiv.org/abs/2410.18105}
}

5DAI
/

APEX-Embedding-7B-v0.1

You need to agree to share your contact information to access this model