ogma-large
32.37M parameter text embedding model by Axiotic AI, achieving 57.38 average on MTEB English (66/66 tasks).
9-layer transformer, 512 hidden dim, mean pooling — strongest overall model.
Highlights
- 57.38 MTEB average on the standard 66-task MTEB English benchmark
- Matryoshka embeddings — dimensions [32, 64, 128, 256] for flexible storage/compute tradeoffs
- Symmetric routing — task tokens
[QRY],[DOC],[SYM]; recommended:[QRY]/[QRY](highest MTEB), with[SYM]everywhere as the next-best alternative.[DOC]is exposed for downstream fine-tuning and is not recommended at inference. - 1024 token context — handles longer passages than typical small models
- HuggingFace Hub — load directly, no local package installation needed
Quick Start
import torch
from huggingface_hub import snapshot_download
import sys, yaml
# Download model from HuggingFace
model_path = snapshot_download("axiotic/ogma-large")
sys.path.insert(0, model_path)
from ogma_model import OgmaModel
from config import OgmaConfig, TaskToken
from tokenizer import OgmaTokenizer
# Load model
with open(f"{model_path}/config.yaml") as f:
cfg = yaml.safe_load(f)
config = OgmaConfig.from_dict(cfg)
model = OgmaModel(config)
state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()
# Load tokenizer
tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json")
# Encode text
sentences = ["The quick brown fox", "A fast auburn canine"]
enc = tokenizer.batch_encode(sentences, max_length=1024)
ids = torch.tensor(enc["input_ids"])
mask = torch.tensor(enc["attention_mask"])
with torch.no_grad():
embs = model.encode(ids, mask, task=TaskToken.SYM)
# Cosine similarity
sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0)
print(f"Similarity: {sim.item():.4f}")
print(f"Shape: {embs.shape}") # (2, 256)
Retrieval (Symmetric Routing)
Ogma is trained for symmetric routing — encode queries and documents with the same task token. The recommended route is [QRY]/[QRY] (both sides use TaskToken.QRY); this benchmarked highest on MTEB. [SYM] everywhere is the next-best symmetric alternative — try it on your data if you want to compare. [DOC] is not recommended at inference — it is exposed for downstream fine-tuning, not as an asymmetric query/document route.
queries = ["What is machine learning?"]
documents = ["ML is a subset of AI...", "The weather is sunny today"]
q_enc = tokenizer.batch_encode(queries, max_length=1024)
d_enc = tokenizer.batch_encode(documents, max_length=1024)
with torch.no_grad():
# Symmetric: both queries and documents use TaskToken.QRY (not a typo).
# Swap TaskToken.QRY → TaskToken.SYM on both sides to try the SYM route instead.
q_embs = model.encode(torch.tensor(q_enc["input_ids"]),
torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY)
d_embs = model.encode(torch.tensor(d_enc["input_ids"]),
torch.tensor(d_enc["attention_mask"]), task=TaskToken.QRY)
scores = q_embs @ d_embs.T
print(f"Relevance scores: {scores}")
Matryoshka Dimensionality Reduction
full = model.encode(ids, mask, task=TaskToken.SYM) # (256d)
small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1) # (32d)
Architecture
| Component | Details |
|---|---|
| Parameters | 32.37M |
| Layers | 9 |
| Hidden dim | 512 |
| Output dim | 256 |
| Heads | 8 |
| Max seq len | 1024 |
| Matryoshka | [32, 64, 128, 256] |
| Pooling | Mean |
| Positional | RoPE |
| FFN | SwiGLU |
| Tokenizer | SentencePiece Unigram (30K) |
MTEB Results (66/66 tasks)
| Category | ogma-large |
|---|---|
| Classification | 68.4 |
| Clustering | 41.6 |
| PairClassification | 84.0 |
| Reranking | 53.1 |
| Retrieval | 43.7 |
| STS | 83.7 |
| Summarization | 30.9 |
| Overall | 57.38 |
Benchmarked with MTEB v2.10.7 on the standard 66-task English benchmark using category averaging (same methodology as the MTEB leaderboard).
Ogma Model Family
| Model | Params | MTEB-66 | Best For |
|---|---|---|---|
| ogma-large | 32.37M | 57.38 | Maximum quality |
| ogma-base | 13.32M | 56.54 | General purpose |
| ogma-small | 8.60M | 55.79 | Best sub-10M |
| ogma-mini | 3.51M | 51.42 | Edge deployment |
| ogma-micro | 2.32M | 49.77 | Extreme edge |
License
This model is licensed under CC-BY-NC-4.0. Commercial use requires a separate license from Axiotic AI.
CC-BY-NC-4.0
- Downloads last month
- 203
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassificationtest set self-reported72.850
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported83.510
- accuracy on MTEB AmazonReviewsClassificationtest set self-reported39.850
- v_measure on MTEB BiorxivClusteringP2Ptest set self-reported34.840
- v_measure on MTEB BiorxivClusteringS2Stest set self-reported27.020
- ndcg_at_10 on MTEB CQADupstackAndroidRetrievaltest set self-reported38.980
- ndcg_at_10 on MTEB CQADupstackEnglishRetrievaltest set self-reported39.780
- ndcg_at_10 on MTEB CQADupstackGamingRetrievaltest set self-reported48.240
- ndcg_at_10 on MTEB CQADupstackGisRetrievaltest set self-reported33.090
- ndcg_at_10 on MTEB CQADupstackMathematicaRetrievaltest set self-reported25.360
- ndcg_at_10 on MTEB CQADupstackPhysicsRetrievaltest set self-reported38.020
- ndcg_at_10 on MTEB CQADupstackProgrammersRetrievaltest set self-reported36.420
- ndcg_at_10 on MTEB CQADupstackRetrievaltest set self-reported33.610
- ndcg_at_10 on MTEB CQADupstackStatsRetrievaltest set self-reported28.070
- ndcg_at_10 on MTEB CQADupstackTexRetrievaltest set self-reported23.290
- ndcg_at_10 on MTEB CQADupstackUnixRetrievaltest set self-reported32.780