Qwen2.5-7B-Instruct-MNN

Introduction

This model is a 4-bit quantized version of the MNN model exported from Qwen2.5-7B-Instruct using llmexport.

Download

# install huggingface
pip install huggingface
# shell download
huggingface download --model 'taobao-mnn/Qwen2.5-7B-Instruct-MNN' --local_dir 'path/to/dir'
# SDK download
from huggingface_hub import snapshot_download
model_dir = snapshot_download('taobao-mnn/Qwen2.5-7B-Instruct-MNN')
# git clone
git clone https://www.modelscope.cn/taobao-mnn/Qwen2.5-7B-Instruct-MNN

Usage

# clone MNN source
git clone https://github.com/alibaba/MNN.git

# compile
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j

# run
./llm_demo /path/to/Qwen2.5-7B-Instruct-MNN/config.json prompt.txt

Document

MNN-LLM

Downloads last month
8,183
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including taobao-mnn/Qwen2.5-7B-Instruct-MNN