ogma-large

32.37M parameter text embedding model by Axiotic AI, achieving 57.38 average on MTEB English (66/66 tasks).

9-layer transformer, 512 hidden dim, mean pooling — strongest overall model.

Highlights

57.38 MTEB average on the standard 66-task MTEB English benchmark
Matryoshka embeddings — dimensions [32, 64, 128, 256] for flexible storage/compute tradeoffs
Symmetric routing — task tokens [QRY], [DOC], [SYM]; recommended: [QRY]/[QRY] (highest MTEB), with [SYM] everywhere as the next-best alternative. [DOC] is exposed for downstream fine-tuning and is not recommended at inference.
1024 token context — handles longer passages than typical small models
HuggingFace Hub — load directly, no local package installation needed

Quick Start

import torch
from huggingface_hub import snapshot_download
import sys, yaml

# Download model from HuggingFace
model_path = snapshot_download("axiotic/ogma-large")
sys.path.insert(0, model_path)

from ogma_model import OgmaModel
from config import OgmaConfig, TaskToken
from tokenizer import OgmaTokenizer

# Load model
with open(f"{model_path}/config.yaml") as f:
    cfg = yaml.safe_load(f)
config = OgmaConfig.from_dict(cfg)
model = OgmaModel(config)
state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()

# Load tokenizer
tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json")

# Encode text
sentences = ["The quick brown fox", "A fast auburn canine"]
enc = tokenizer.batch_encode(sentences, max_length=1024)
ids = torch.tensor(enc["input_ids"])
mask = torch.tensor(enc["attention_mask"])

with torch.no_grad():
    embs = model.encode(ids, mask, task=TaskToken.SYM)

# Cosine similarity
sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0)
print(f"Similarity: {sim.item():.4f}")
print(f"Shape: {embs.shape}")  # (2, 256)

Retrieval (Symmetric Routing)

Ogma is trained for symmetric routing — encode queries and documents with the same task token. The recommended route is [QRY]/[QRY] (both sides use TaskToken.QRY); this benchmarked highest on MTEB. [SYM] everywhere is the next-best symmetric alternative — try it on your data if you want to compare. [DOC] is not recommended at inference — it is exposed for downstream fine-tuning, not as an asymmetric query/document route.

queries = ["What is machine learning?"]
documents = ["ML is a subset of AI...", "The weather is sunny today"]

q_enc = tokenizer.batch_encode(queries, max_length=1024)
d_enc = tokenizer.batch_encode(documents, max_length=1024)

with torch.no_grad():
    # Symmetric: both queries and documents use TaskToken.QRY (not a typo).
    # Swap TaskToken.QRY → TaskToken.SYM on both sides to try the SYM route instead.
    q_embs = model.encode(torch.tensor(q_enc["input_ids"]),
                           torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY)
    d_embs = model.encode(torch.tensor(d_enc["input_ids"]),
                           torch.tensor(d_enc["attention_mask"]), task=TaskToken.QRY)

scores = q_embs @ d_embs.T
print(f"Relevance scores: {scores}")

Matryoshka Dimensionality Reduction

full = model.encode(ids, mask, task=TaskToken.SYM)       # (256d)
small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1)  # (32d)

Architecture

Component	Details
Parameters	32.37M
Layers	9
Hidden dim	512
Output dim	256
Heads	8
Max seq len	1024
Matryoshka	[32, 64, 128, 256]
Pooling	Mean
Positional	RoPE
FFN	SwiGLU
Tokenizer	SentencePiece Unigram (30K)

MTEB Results (66/66 tasks)

Category	ogma-large
Classification	68.4
Clustering	41.6
PairClassification	84.0
Reranking	53.1
Retrieval	43.7
STS	83.7
Summarization	30.9
Overall	57.38

Benchmarked with MTEB v2.10.7 on the standard 66-task English benchmark using category averaging (same methodology as the MTEB leaderboard).

Ogma Model Family

Model	Params	MTEB-66	Best For
ogma-large	32.37M	57.38	Maximum quality
ogma-base	13.32M	56.54	General purpose
ogma-small	8.60M	55.79	Best sub-10M
ogma-mini	3.51M	51.42	Edge deployment
ogma-micro	2.32M	49.77	Extreme edge

License

This model is licensed under CC-BY-NC-4.0. Commercial use requires a separate license from Axiotic AI.

CC-BY-NC-4.0

Downloads last month: 203

Safetensors

Model size

32.4M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification
test set self-reported

72.850
accuracy on MTEB AmazonPolarityClassification
test set self-reported

83.510
accuracy on MTEB AmazonReviewsClassification
test set self-reported

39.850
v_measure on MTEB BiorxivClusteringP2P
test set self-reported

34.840
v_measure on MTEB BiorxivClusteringS2S
test set self-reported

27.020
ndcg_at_10 on MTEB CQADupstackAndroidRetrieval
test set self-reported

38.980
ndcg_at_10 on MTEB CQADupstackEnglishRetrieval
test set self-reported

39.780
ndcg_at_10 on MTEB CQADupstackGamingRetrieval
test set self-reported

48.240
ndcg_at_10 on MTEB CQADupstackGisRetrieval
test set self-reported

33.090
ndcg_at_10 on MTEB CQADupstackMathematicaRetrieval
test set self-reported

25.360
ndcg_at_10 on MTEB CQADupstackPhysicsRetrieval
test set self-reported

38.020
ndcg_at_10 on MTEB CQADupstackProgrammersRetrieval
test set self-reported

36.420
ndcg_at_10 on MTEB CQADupstackRetrieval
test set self-reported

33.610
ndcg_at_10 on MTEB CQADupstackStatsRetrieval
test set self-reported

28.070
ndcg_at_10 on MTEB CQADupstackTexRetrieval
test set self-reported

23.290
ndcg_at_10 on MTEB CQADupstackUnixRetrieval
test set self-reported

32.780