• Quantization of Qwen2.5 14B for edge devices 7.3Gb footprint

  • One of the best models I tried in Spanish.

  • Original model: https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5

  • Models Merged:

    • huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
    • allura-org/TQ2.5-14B-Aletheia-v1
    • EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
    • v000000/Qwen2.5-Lumen-14B
  • All quants made using imatrix option with dataset from here

  • Using llama.cpp compiled with CUDA support for quantization and inference:

ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 3982 (cc2983d3) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Downloads last month
46
GGUF
Model size
14.8B params
Architecture
qwen2
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.