gpt-oss-120b-Derestricted-mxfp4-mlx

MLX MXFP4 quantization of ArliAI/gpt-oss-120b-Derestricted.

Model Capabilities

  • Reasoning: Configurable effort (low/medium/high) via reasoning_effort parameter
  • Tool Use: Native function calling support
  • Context: 131k tokens

Quantization Details

Matches OpenAI's original MXFP4 quantization scheme:

Component Bits Group Size Format
MLP Experts 4 32 MXFP4
Attention - - Full precision (bfloat16)
Routers - - Full precision (bfloat16)
Embeddings - - Full precision (bfloat16)
LM Head - - Full precision (bfloat16)

Usage

mlx-lm

mlx_lm.chat --model txgsync/gpt-oss-120b-Derestricted-mxfp4-mlx

LM Studio

For full Reasoning Effort support in LM Studio, install via:

lms get txgsync/gpt-oss-120b-derestricted

This downloads the model with the virtual model wrapper that enables the Reasoning Effort selector.

Notes

  • Requires mlx-lm with gpt_oss HF format support
  • Quantization matches OpenAI's modules_to_not_convert scheme for optimal quality
Downloads last month
427
Safetensors
Model size
117B params
Tensor type
BF16
·
U8
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for txgsync/gpt-oss-120b-Derestricted-mxfp4-mlx

Quantized
(5)
this model