Imran1
/

Qwen2.5-72B-Instruct-FP8

Model card Files Files and versions

Imran1 commited on Oct 10, 2024

Commit

2744dc1

·

verified ·

1 Parent(s): eea1116

Update code/inference.py

Files changed (1) hide show

code/inference.py +1 -1

code/inference.py CHANGED Viewed

@@ -72,7 +72,7 @@ def model_fn(model_dir, context=None):
                     model_dir,
                     device_map="auto",  # Automatically map layers across GPUs
                     offload_folder=offload_dir,  # Offload parts to disk if needed
-                    max_memory = {i: "15GiB" for i in range(torch.cuda.device_count())}  # Example for reducing usage per GPU
                     no_split_module_classes=["QwenForCausalLM"]  # Ensure model is split across the GPUs
                 )

                     model_dir,
                     device_map="auto",  # Automatically map layers across GPUs
                     offload_folder=offload_dir,  # Offload parts to disk if needed
+                    max_memory = {i: "15GiB" for i in range(torch.cuda.device_count())},  # Example for reducing usage per GPU
                     no_split_module_classes=["QwenForCausalLM"]  # Ensure model is split across the GPUs
                 )