Feature Extraction
Transformers
bloom

This is a custom INT8 version of the original BLOOM weights to make it fast to use with the DeepSpeed-Inference engine which uses Tensor Parallelism. In this repo the tensors are split into 8 shards to target 8 GPUs.

The full BLOOM documentation is here.

To use the weights in repo, you can adapt to your needs the scripts found here (XXX: they are going to migrate soon to HF Transformers code base, so will need to update the link once moved).

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using microsoft/bloom-deepspeed-inference-int8 1