Q4_K_M GGUF quant of Reflection-Llama-3.1-70B - fixed version.
Runs great on 48GB VRAM, tested.
Ollama modelfile added - version with original system prompt - output is split into "thinking" and "output" tags.
If you want llama 3.1 'vanilla' experience, just remove SYSTEM from modelfile before creating ollama model.

All comments are greatly appreciated, download, test and if you appreciate my work, consider buying me my fuel:

Downloads last month: 14

GGUF

Model size

70.6B params

Architecture

llama

4-bit

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including TeeZee/Reflection-Llama-3.1-70B-GGUF

48 GB VRAM

Collection

Quants that run fast on 2x3090 consuming 48GB total VRAM. • 2 items • Updated Sep 9, 2024