ogma-micro · 2.3M efficient text embedding model · MTEB 52.18
Ultra-small English text embedding model for semantic search, RAG, vector search, clustering, classification, and agent memory — MTEB 52.18, 2.3M parameters, 128d output
Ogma Micro is the most compact model in the Ogma family. At 2.3M parameters and 8.9 MB it scores 52.18 MTEB in our 66-task run while staying small enough to ship in browsers and on-device runtimes. Outputs 128-dimensional embeddings for maximum indexing efficiency. For extreme latency, edge, and browser workloads.
Why the name Ogma?
Ogma is named after Ogma (also written Oghma), the Irish god associated with eloquence and credited in myth with inventing Ogham, an early alphabet for encoding language into symbols. That is the core job of an embedding model: turn language into compact vectors that machines can search, compare, cluster, and reason over.
Use cases
ogma-micro is the smallest Ogma model, built for on-device embedding, edge search, browser-side retrieval, local semantic search, agent memory, deduplication, classification, clustering, and privacy-sensitive applications where sending text to an external embedding API is undesirable.
Good fits:
- Mobile and desktop apps that need local text embeddings without a large model download.
- Browser, WebAssembly, and extension-style workflows where package size and vector index size matter.
- Serverless and high-fanout applications that need many cheap embedding calls with predictable memory use.
- Local-first search over notes, messages, logs, support tickets, snippets, or small document collections.
- Efficient vector databases where 128-dimensional embeddings reduce storage, bandwidth, and ANN latency.
Choose ogma-micro when footprint matters more than absolute benchmark quality. Move up to ogma-mini or ogma-small when you can spend more memory for stronger representations.
Highlights
- 🏆 MTEB avg 52.18 — compact 2.3M-parameter model from the canonical Ogma paper results
- 📦 8.9 MB — smallest in the family
- 📐 128-dim output — half the index size of other Ogma models
- 📏 1024-token context — 4× longer than all-MiniLM-L6-v2 (256 tokens)
- 🔀 Symmetric routing via task tokens — encode everything with
[SYM], or use[QRY]/[QRY]for retrieval (queries and documents both encoded withtask="qry"); benchmark both routes on your task - 📐 Matryoshka dims: [128, 64, 32] — compress to 32d for ultra-low memory indexing
Performance
MTEB English — 66/66 tasks (category-averaged)
Benchmarked with MTEB v2.10.7 on the standard 66-task English benchmark using category averaging (same methodology as the MTEB leaderboard).
| Category | ogma-micro | all-MiniLM-L6-v2 | Δ vs MiniLM |
|---|---|---|---|
| Classification | 59.53 | 62.62 | -3.09 |
| Clustering | 36.88 | 41.94 | -5.06 |
| PairClassification | 78.62 | 82.37 | -3.75 |
| Reranking | 49.74 | 58.04 | -8.30 |
| Retrieval | 33.09 | 41.95 | -8.86 |
| STS | 75.63 | 78.90 | -3.27 |
| Summarization | 31.77 | 30.81 | +0.96 |
| Overall | 52.18 | 56.09 | -3.91 |
Why choose Ogma Micro?
ogma-micro is for when you need the absolute smallest possible model that still achieves competitive MTEB scores. Note the 128-dim output — your vector index will be half the size of other Ogma models. Use ogma-mini if you can afford 3.5M parameters.
Safety — Toxicity & Prompt Injection Detection
Evaluated on the Ogma transformer architecture (same family). Embeddings are extracted then fed to a logistic regression (LR) or MLP classifier head — the embedding model itself is not fine-tuned. Evaluated against all-MiniLM-L6-v2 as baseline.
1. Jigsaw Toxic Comment Classification
Dataset: Arsive/toxicity_classification_jigsaw — Binary toxicity classification
Train: 25,960 · Test: 6,490
| Model | Classifier | Accuracy | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|---|---|
| Ogma | LogReg | 89.12% | 88.26% | 89.09% | 87.44% | 95.74% |
| Ogma | MLP | 88.91% | 87.98% | 89.14% | 86.85% | 95.92% |
| MiniLM | LogReg | 87.32% | 86.25% | 87.46% | 85.07% | 94.96% |
| MiniLM | MLP | 91.71% | 91.24% | 90.13% | 92.39% | 97.16% |
Ogma (LR) leads MiniLM (LR) by +2.01% F1. MiniLM (MLP) leads on this dataset — the additional training data (25K samples) allows the MLP to compensate for MiniLM's slightly weaker base representations.
2. Prompt Injection Detection — deepset/prompt-injections
Dataset: deepset/prompt-injections — Binary injection detection
Train: 546 · Test: 116 (low-data regime)
| Model | Classifier | Accuracy | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|---|---|
| Ogma | LogReg | 86.21% | 84.62% | 100.0% | 73.33% | 97.77% |
| Ogma | MLP | 90.52% | 90.27% | 96.23% | 85.0% | 98.1% |
| MiniLM | LogReg | 82.76% | 80.39% | 97.62% | 68.33% | 94.52% |
| MiniLM | MLP | 87.07% | 86.24% | 95.92% | 78.33% | 93.96% |
Ogma leads across both classifiers: +4.03% F1 (MLP), +4.23% F1 (LogReg). Ogma's representations are better separated in the low-data regime — it achieves 100% precision with LogReg, meaning zero false positives.
3. Prompt Injection Detection — neuralchemy/Prompt-injection-dataset
Dataset: neuralchemy/Prompt-injection-dataset — Binary injection detection
Train: 4,391 · Test: 942
| Model | Classifier | Accuracy | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|---|---|
| Ogma | LogReg | 95.22% | 95.93% | 95.84% | 96.01% | 99.30% |
| Ogma | MLP | 95.44% | 96.16% | 94.89% | 97.46% | 99.37% |
| MiniLM | LogReg | 94.59% | 95.38% | 95.46% | 95.29% | 98.92% |
| MiniLM | MLP | 93.95% | 94.85% | 94.59% | 95.11% | 98.92% |
Ogma leads across all metrics: +0.78% F1 (MLP), +0.55% F1 (LR). Both models perform well at scale; Ogma maintains its edge and achieves higher AUC-ROC (99.37% vs 98.92%).
Summary
| Task | Ogma best F1 | MiniLM best F1 | Δ |
|---|---|---|---|
| Jigsaw Toxicity | 88.26% (LR) | 91.24% (MLP) | −2.98% |
| deepset Injection | 90.27% (MLP) | 86.24% (MLP) | +4.03% |
| neuralchemy Injection | 96.16% (MLP) | 95.38% (LR) | +0.78% |
Ogma is a stronger feature extractor for prompt injection detection — the safety-critical task for agent pipelines. MiniLM edges ahead on toxicity when given sufficient labelled data and a more powerful classifier head. For agentic use cases where detecting adversarial instructions is the priority, Ogma representations are the better choice.
Architecture
| Property | Value |
|---|---|
| Architecture | Custom Transformer |
Internal dim (d_model) |
128 |
Output dim (d_output) |
128 |
| Transformer layers | 2 |
| Attention heads | 2 |
| Vocabulary | 30,000 (SentencePiece / AlbertTokenizer) |
| Max sequence length | 1,024 tokens |
| Pooling | Mean pooling |
| Task tokens | [QRY] (query), [DOC] (document), [SYM] (symmetric) |
| Matryoshka dims | [32, 64, 128] |
| Output normalisation | L2 (unit sphere) |
| Parameters | 2.3M |
| Model file | model.safetensors (8.9 MB) |
Key design choices:
- Task token prepend: A learnable task token (
[QRY],[DOC], or[SYM]) is prepended to the input sequence before the transformer. Recommended inference route:[QRY]/[QRY]— encode both queries and documents with[QRY]; this benchmarked highest on MTEB.[SYM]everywhere is the next-best symmetric alternative. We do not recommend[DOC]at inference time — it is exposed for downstream fine-tuning, not as an asymmetric query/document route. - Matryoshka training: The model is trained with Matryoshka Representation Learning, meaning embeddings truncated to any supported sub-dimension remain well-calibrated without retraining.
- Mean pooling: The average of all token outputs (excluding padding) produces the sentence embedding, which consistently outperforms CLS-token pooling in the Ogma architecture family.
- L2 normalisation: All outputs are unit-normalised; cosine similarity == dot product == euclidean similarity (up to a constant), simplifying downstream usage.
Usage
Installation
pip install torch tokenizers transformers huggingface_hub
Basic Encoding
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("axiotic/ogma-micro", trust_remote_code=True).eval()
tok = AutoTokenizer.from_pretrained("axiotic/ogma-micro", trust_remote_code=True)
sentences = [
"The quick brown fox jumps over the lazy dog",
"A fast auburn vulpine leaps over an idle canine",
"The capital of France is Paris",
]
emb = model.embed(sentences, task="sym", tokenizer=tok)
# emb.shape → (128,) per sentence, L2-normalised
sim = (emb[0] @ emb[1]).item() # cosine sim == dot product (L2-normalised)
print(f"paraphrase: {sim:.4f}")
task="sym" is a safe default for all similarity tasks (STS, clustering,
classification) and for retrieval. Ogma is trained for symmetric routing —
queries and documents are always encoded with the same task token. The two
recommended routes are:
[SYM]for everything (the safe default above), or[QRY]/[QRY]— encode both queries and documents withtask="qry".
Try both on your downstream task; either can win depending on the data, and
[QRY]/[QRY] is the natural starting point when fine-tuning a classifier or
retrieval head on top of the embeddings.
Retrieval
Encode queries and documents with the same task token. Below we show the [QRY]/[QRY] route — both calls use task="qry". This is intentional (Ogma is symmetric, not asymmetric); swap in task="sym" to compare the SYM route on your data.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("axiotic/ogma-micro", trust_remote_code=True).eval()
tok = AutoTokenizer.from_pretrained("axiotic/ogma-micro", trust_remote_code=True)
queries = ["What is knowledge distillation?"]
docs = [
"Knowledge distillation trains a smaller student model to mimic a larger teacher.",
"The Eiffel Tower is in Paris, France.",
]
q = model.embed(queries, task="qry", tokenizer=tok) # (128,) per query — symmetric: both sides use qry
d = model.embed(docs, task="qry", tokenizer=tok) # (128,) per doc — not a typo; Ogma is symmetric
scores = (q @ d.T).squeeze(0) # cosine sim (L2-normalised, dot == cosine)
print(scores.tolist()) # [higher, lower] — first doc is relevant
Matryoshka — Flexible Dimensionality
Ogma is trained with Matryoshka Representation Learning. Slice and re-normalise to any supported sub-dimension with no retraining:
import torch, torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("axiotic/ogma-micro", trust_remote_code=True).eval()
tok = AutoTokenizer.from_pretrained("axiotic/ogma-micro", trust_remote_code=True)
emb = model.embed(["hello world"], task="sym", tokenizer=tok) # full 128d
for d in model.config.matryoshka_dims:
sub = F.normalize(emb[:, :d], dim=-1)
print(f"{d}d norm={sub.norm(dim=-1).item():.4f}")
Model Family
| Model | Params | Size | MTEB Avg | Class | Clust | PairClass | Rerank | Ret | STS | Summ | d_out | Context |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ogma-large | 32.4M | 124 MB | 57.41 | 68.6 | 41.6 | 84.0 | 53.1 | 43.7 | 83.7 | 30.9 | 256 | 1024 |
| ogma-base | 13.3M | 51 MB | 57.02 | 67.74 | 41.49 | 83.73 | 51.25 | 42.36 | 82.84 | 29.73 | 256 | 1024 |
| ogma-small | 8.6M | 33 MB | 56.32 | 66.49 | 40.69 | 82.91 | 50.51 | 42.05 | 82.00 | 29.59 | 256 | 1024 |
| ogma-mini | 3.5M | 14 MB | 53.06 | 61.77 | 37.38 | 79.66 | 47.39 | 36.21 | 77.71 | 31.33 | 256 | 1024 |
| ogma-micro | 2.3M | 8.9 MB | 52.18 | 59.53 | 36.88 | 78.62 | 49.74 | 33.09 | 75.63 | 31.77 | 128 | 1024 |
| all-MiniLM-L6-v2 | 22.7M | 87 MB | 56.09 | 62.62 | 41.94 | 82.37 | 58.04 | 41.95 | 78.90 | 30.81 | 384 | 256 |
| potion-base-32M | 32.0M | 123 MB | 51.22 | 66.0 | 39.2 | 78.2 | 50.9 | 32.2 | 73.9 | 29.8 | 256 | inf |
| potion-base-8M | 7.6M | 29 MB | 50.03 | 64.44 | 32.93 | 76.62 | 49.73 | 31.71 | 73.24 | 29.28 | 256 | inf |
All Ogma: MTEB 2.10.7, 66-task standard English set, category-averaged. MiniLM/Potion: published scores from the Model2Vec results page.
Training Details
| Property | Value |
|---|---|
| Teacher model | jinaai/jina-embeddings-v5-text-small (CC-BY-NC-4.0) |
| Training paradigm | Knowledge distillation from cached teacher embeddings |
| Training data | ~7M curated English sentence pairs |
| Tokenizer | AlbertTokenizer (SentencePiece, vocab=30,000) |
| Embedding initialisation | PCA of teacher embeddings (128d) projected to d_model |
| Loss | Distillation + contrastive (balanced schedule) |
| Evaluation framework | MTEB 2.10.7 |
Limitations
- No text generation. Ogma is an encoder-only embedding model.
- English only. Training data and evaluation are English-only.
- Slower than static models. Transformer inference is 40-100× slower than static models (Potion, Model2Vec) on CPU. The trade-off: contextual understanding and 4× longer sequences.
- Non-commercial licence. Due to distillation from a CC-BY-NC-4.0 teacher, Ogma inherits the NonCommercial restriction. Commercial use requires a separate Jina AI licence or retraining with a permissive teacher (Apache 2.0 compatible models like BGE or E5 can substitute at the cost of a full retraining run).
- Reranking gap. Ogma lags behind MiniLM-L6-v2 on reranking tasks (category avg delta: -8.3). This is an architectural characteristic: the model optimises for semantic similarity and classification over pairwise ranking.
Licence & Attribution
This model is released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
Required attribution (must be included in all uses):
This model was trained via knowledge distillation from
jina-embeddings-v5-text-small(https://huggingface.co/jinaai/jina-embeddings-v5-text-small) by Jina AI, licensed under CC-BY-NC-4.0.
Citation
@misc{ogma2026,
title = {Ogma: Efficient Dense Retrieval via Structured Embeddings},
author = {Axiotic AI},
year = {2026},
url = {https://huggingface.co/axiotic/ogma-micro},
}
- Downloads last month
- 215
Space using axiotic/ogma-micro 1
Evaluation results
- cosine_spearman on MTEB STSBenchmarktest set self-reported77.820
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported67.630
- v_measure on MTEB RedditClusteringtest set self-reported37.830
- cos_sim_ap on MTEB TwitterSemEval2015test set self-reported60.030
- map on MTEB MindSmallRerankingvalidation set self-reported30.100
- ndcg_at_10 on MTEB MSMARCOself-reported21.780
- cos_sim_spearman on MTEB SummEvaltest set self-reported31.770