ogma-large

32.37M parameter text embedding model by Axiotic AI, achieving 57.38 average on MTEB English (66/66 tasks).

9-layer transformer, 512 hidden dim, mean pooling — strongest overall model.

Highlights

  • 57.38 MTEB average on the standard 66-task MTEB English benchmark
  • Matryoshka embeddings — dimensions [32, 64, 128, 256] for flexible storage/compute tradeoffs
  • Symmetric routing — task tokens [QRY], [DOC], [SYM]; recommended: [QRY]/[QRY] (highest MTEB), with [SYM] everywhere as the next-best alternative. [DOC] is exposed for downstream fine-tuning and is not recommended at inference.
  • 1024 token context — handles longer passages than typical small models
  • HuggingFace Hub — load directly, no local package installation needed

Quick Start

import torch
from huggingface_hub import snapshot_download
import sys, yaml

# Download model from HuggingFace
model_path = snapshot_download("axiotic/ogma-large")
sys.path.insert(0, model_path)

from ogma_model import OgmaModel
from config import OgmaConfig, TaskToken
from tokenizer import OgmaTokenizer

# Load model
with open(f"{model_path}/config.yaml") as f:
    cfg = yaml.safe_load(f)
config = OgmaConfig.from_dict(cfg)
model = OgmaModel(config)
state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()

# Load tokenizer
tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json")

# Encode text
sentences = ["The quick brown fox", "A fast auburn canine"]
enc = tokenizer.batch_encode(sentences, max_length=1024)
ids = torch.tensor(enc["input_ids"])
mask = torch.tensor(enc["attention_mask"])

with torch.no_grad():
    embs = model.encode(ids, mask, task=TaskToken.SYM)

# Cosine similarity
sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0)
print(f"Similarity: {sim.item():.4f}")
print(f"Shape: {embs.shape}")  # (2, 256)

Retrieval (Symmetric Routing)

Ogma is trained for symmetric routing — encode queries and documents with the same task token. The recommended route is [QRY]/[QRY] (both sides use TaskToken.QRY); this benchmarked highest on MTEB. [SYM] everywhere is the next-best symmetric alternative — try it on your data if you want to compare. [DOC] is not recommended at inference — it is exposed for downstream fine-tuning, not as an asymmetric query/document route.

queries = ["What is machine learning?"]
documents = ["ML is a subset of AI...", "The weather is sunny today"]

q_enc = tokenizer.batch_encode(queries, max_length=1024)
d_enc = tokenizer.batch_encode(documents, max_length=1024)

with torch.no_grad():
    # Symmetric: both queries and documents use TaskToken.QRY (not a typo).
    # Swap TaskToken.QRY → TaskToken.SYM on both sides to try the SYM route instead.
    q_embs = model.encode(torch.tensor(q_enc["input_ids"]),
                           torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY)
    d_embs = model.encode(torch.tensor(d_enc["input_ids"]),
                           torch.tensor(d_enc["attention_mask"]), task=TaskToken.QRY)

scores = q_embs @ d_embs.T
print(f"Relevance scores: {scores}")

Matryoshka Dimensionality Reduction

full = model.encode(ids, mask, task=TaskToken.SYM)       # (256d)
small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1)  # (32d)

Architecture

Component Details
Parameters 32.37M
Layers 9
Hidden dim 512
Output dim 256
Heads 8
Max seq len 1024
Matryoshka [32, 64, 128, 256]
Pooling Mean
Positional RoPE
FFN SwiGLU
Tokenizer SentencePiece Unigram (30K)

MTEB Results (66/66 tasks)

Category ogma-large
Classification 68.4
Clustering 41.6
PairClassification 84.0
Reranking 53.1
Retrieval 43.7
STS 83.7
Summarization 30.9
Overall 57.38

Benchmarked with MTEB v2.10.7 on the standard 66-task English benchmark using category averaging (same methodology as the MTEB leaderboard).

Ogma Model Family

Model Params MTEB-66 Best For
ogma-large 32.37M 57.38 Maximum quality
ogma-base 13.32M 56.54 General purpose
ogma-small 8.60M 55.79 Best sub-10M
ogma-mini 3.51M 51.42 Edge deployment
ogma-micro 2.32M 49.77 Extreme edge

License

This model is licensed under CC-BY-NC-4.0. Commercial use requires a separate license from Axiotic AI.

CC-BY-NC-4.0

Downloads last month
203
Safetensors
Model size
32.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results