This is a 2.4-bit EXL2 quantization of Aurelian v0.5 70B 32K, an interim checkpoint before v1.0. See that page for more details.
This quantization fits in a single 24GB using Exllamav2 & 8-bit cache @ 10K context. It uses the newer experimental quantization method from turboderp.
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.