• Quantization of Qwen2.5 14B for edge devices 7.3Gb footprint

  • One of the best models I tried in Spanish.

  • Original model: https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5

  • Models Merged:

    • huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
    • allura-org/TQ2.5-14B-Aletheia-v1
    • EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
    • v000000/Qwen2.5-Lumen-14B
  • All quants made using imatrix option with dataset from here

  • Using llama.cpp compiled with CUDA support for quantization and inference:

ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 3982 (cc2983d3) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Downloads last month
27
GGUF
Model size
14.8B params
Architecture
qwen2
Inference API
Unable to determine this model's library. Check the docs .