Spaces:
Running
Running
Fixed the issue where ALMA running on CPU led to the occurrence of the "addmm_impl_cpu_ not implemented for 'Half'" exception.
Browse files
docs/translateModel.md
CHANGED
@@ -73,6 +73,7 @@ The 'mt5-zh-ja-en-trimmed' model is finetuned from Google's 'mt5-base' model. Th
|
|
73 |
|
74 |
## ALMA
|
75 |
|
|
|
76 |
ALMA is a many-to-many LLM-based translation model introduced by Haoran Xu and colleagues in September 2023. It is based on the fine-tuning of a large language model (LLaMA-2). The approach used for this model is referred to as Advanced Language Model-based trAnslator (ALMA). The paper is titled "`A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models`" ([arXiv:2309.11674](https://arxiv.org/abs/2309.11674)).
|
77 |
The official support for ALMA currently includes 10 language directions: English↔German, English↔Czech, English↔Icelandic, English↔Chinese, and English↔Russian. However, the author hints that there might be surprises in other directions, so there are currently no restrictions on the languages that ALMA can be chosen for in the web UI.
|
78 |
|
|
|
73 |
|
74 |
## ALMA
|
75 |
|
76 |
+
ALMA is an excellent translation model, but it is strongly discouraged to operate it on CPU.
|
77 |
ALMA is a many-to-many LLM-based translation model introduced by Haoran Xu and colleagues in September 2023. It is based on the fine-tuning of a large language model (LLaMA-2). The approach used for this model is referred to as Advanced Language Model-based trAnslator (ALMA). The paper is titled "`A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models`" ([arXiv:2309.11674](https://arxiv.org/abs/2309.11674)).
|
78 |
The official support for ALMA currently includes 10 language directions: English↔German, English↔Czech, English↔Icelandic, English↔Chinese, and English↔Russian. However, the author hints that there might be surprises in other directions, so there are currently no restrictions on the languages that ALMA can be chosen for in the web UI.
|
79 |
|
src/translation/translationModel.py
CHANGED
@@ -163,8 +163,10 @@ class TranslationModel:
|
|
163 |
self.transTokenizer = transformers.AutoTokenizer.from_pretrained(self.modelPath, use_fast=True)
|
164 |
transModelConfig = transformers.AutoConfig.from_pretrained(self.modelPath)
|
165 |
if self.device == "cpu":
|
|
|
|
|
166 |
transModelConfig.quantization_config["use_exllama"] = False
|
167 |
-
self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision, config=transModelConfig)
|
168 |
else:
|
169 |
# transModelConfig.quantization_config["exllama_config"] = {"version":2} # After configuring to use ExLlamaV2, VRAM cannot be effectively released, which may be an issue. Temporarily not adopting the V2 version.
|
170 |
self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision)
|
|
|
163 |
self.transTokenizer = transformers.AutoTokenizer.from_pretrained(self.modelPath, use_fast=True)
|
164 |
transModelConfig = transformers.AutoConfig.from_pretrained(self.modelPath)
|
165 |
if self.device == "cpu":
|
166 |
+
# ALMA is an excellent translation model, but it is strongly discouraged to operate it on CPU.
|
167 |
+
# set torch_dtype=torch.float32 to prevent the occurrence of the exception "addmm_impl_cpu_ not implemented for 'Half'."
|
168 |
transModelConfig.quantization_config["use_exllama"] = False
|
169 |
+
self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision, config=transModelConfig, torch_dtype=torch.float32)
|
170 |
else:
|
171 |
# transModelConfig.quantization_config["exllama_config"] = {"version":2} # After configuring to use ExLlamaV2, VRAM cannot be effectively released, which may be an issue. Temporarily not adopting the V2 version.
|
172 |
self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision)
|